Return to list
*Speech Sciences Entries*. Speech Prosody Studies Group.
Disponível em: https://gepf.falar.org/entries/34

Mathematical models in Phonetics

Michael O'Dell |
University of Helsinki
How to cite:

O'Dell, Michael
(2022)
Mathematical models in Phonetics. In: - MATHEMATICAL MODELING

Mathematical models represent simplifications of real world processes and phenomena for the purpose of increasing understanding. Simpler models are easier to understand, but more complex models give a better fit to the complex processes of the real world. Modeling is thus always a compromise between fidelity and understanding. Mathematical modeling can be seen as an additional step after delimiting and idealizing a real world phenomenon of interest into a conceptual model. Turning a conceptual model into a precise mathematical model means further abstraction and symbolic representation as mathematical objects for the purpose of applying established mathematical theory to obtain conclusions or predictions. These conclusions are often not immediately or intuitively obvious, which is why using mathematical models can amplify our understanding of the phenomena involved. The conclusions and predictions of the model can then be compared with fresh data, possibly leading to new observations which otherwise might have gone unnoticed. Often this comparison leads to a new round of modeling focused on locating and revising the aspects of the model responsible for observed discrepancies between predictions of the model and real world data.

The application of mathematics (as a general science of patterning)
can be seen as looking for familiar patterns in novel instances or
settings. Often the cyclical modeling process starts from familiarity
with the patterns of various branches of mathematics in general and
proceeds by considering whether there might be sensible applications in
the specific empirical field of study. For example, in the history of
phonetics, applying the mathematical theory used for mechanical springs
in physics to thinking about speech production has led to characterizing
articulatory movements using concepts such as *stiffness *and *damping *[cf. eg. 21, 4].

- DYNAMICAL MODELS

Because the study of speech is primarily concerned with concrete
events and processes occurring at many different time scales, models of
speech generally have a dynamical character. The mathematical theory of
dynamical systems is very well developed, so there are many advantages
to be gained by applying the tools of dynamical systems analysis to
modeling real world phenomena including speech. Dynamical systems theory
includes many useful concepts such as phase portrait, trajectory, and
bifurcation. It also provides a compatible language in which to describe
both physical and mental phenomena and their interaction [cf. 11,
especially Ch. 5, *Intentional Dynamics*]. A very accessible
classic introduction to dynamical systems theory with a minimum of
formal mathematics and emphasizing visualization techniques is [1]. In
phonetics, dynamical systems theory is an integral part of the very
succesful theory of Task Dynamics and Articulatory Phonology [cf. eg. 5,
21, 7, 23].

A dynamical system can be thought of in terms of two components: a *state space *consisting
of all the possible states the system could ever be in at any single
time, combined with a rule associated with each possible state telling
which state the system will be in next. One of the ways of
characterizing such systems is to ask about long term behaviour. For
instance there may be attracting states that the system tends to
approach or states the system tends to avoid. A slightly more complex
object is the so called saddle point, a state which is attracting from
some directions, but repellent in other directions.

A concept from dynamical systems theory which has proved useful in phonetics is the so called *limit cycle*:
a repetitious trajectory or state sequence that the system tends to
settle into. Such oscillatory behavior is ubiquitous in speaking, from
turn taking in conversation to the repetitious vibration of the vocal
folds during voiced portions of speech. Quite complex behavior can arise
in dynamical systems where several limit cycles interact (“coupled
oscillators”), possibly leading to roughly synchronous behavior with
complex rhythms [cf. eg. 19]. Coupled oscillator models have been used
to model several aspects of speech timing [cf. eg. 3, 2, 15, 16].

An area of dynamical systems theory which has received a great deal
of attention recently (for instance in the fields of machine learning
and cognitive processing) concerns systems with many saddle points
chained together into so called heteroclinic networks (also known as *winnerless competition models*). The neighborhoods of saddle points in such a system act as *temporarily *stable (or *metastable*)
macro states: behavior which is initially attracted towards such a
state is eventually repelled towards the next saddle point and its
corresponding region of state space. Interestingly such a system is
capable of (re)producing different sequences of behavior based on small
changes in its parameters, leading to the marriage of *reproducibility and flexibility of transient behavior *[20].
Heteroclinic networks have not as yet received much attention in
phonetics research, but perhaps the time is ripe. Applied to speech, it
is easy to think of the saddle points as corresponding to sequential
phonetic categories, such as consonants and vowels, syllables,
articulatory gestures, etc. In a heteroclinic network the saddle point
targets are approached (and corresponding catgories realized) in
sequence, but transitions between categories are continuous, just as in
real world speech.

- STOCHASTIC MODELS

General mathematical models include parameters which are unknown and
need to be estimated from empirical data. One approach is to find values
of the parameters which minimize an “error function” based on some
measurable quantity (empirical data) compared with the corresponding
values predicted by the model for given parameter values. This approach
might be called a curve fitting approach, because empirical data can be
seen as dictating certain points in a graph, and modeling as finding a
curve to fit, or at least approximate the data points. Even though the
model considered may be a deterministic model which does not formally
incorporate randomness (i.e. it is not a *stochastic *model
described using probability distributions), such a curve fitting
approach implies at least some stochasticity (or randomness), at least
if some (minimized) amount of error is tolerated without rejecting the
model. At the very least it implies that the empirical measurements
themselves are inexact and include some degree of uncertainty. A more
explicit form of the model then would be a stochastic model in order to
include this measurement uncertainty.

Another source of uncertainty is indeterminancy inherent in the process being modeled, in other words when the process itself is considered to be non-deterministic, so that future behavior can not be exactly predicted in principle, even given exact knowledge of previous behavior. Of course, in practice, exact knowledge of previous behavior and all the myriad small influences of the environment is impossible, so there is also error (modeled as random) caused by the multitude of real world factors that have been excluded from the simplified model. This means that mathematical models need to have stochastic elements allowing the model to be fit to real data.

*Bayesian analysis of models and data. *If the error function
used to find optimal values for model parameters is considered to
represent a probability distribution, then curve fitting can be seen as
an attempt to find the *most likely *parameter values. In this setting the parameter values found are called maximum likelihood estimates (MLE).

Rather than settling only for the most likely values, Bayesian
analysis attempts to find a complete probability distribution
characterizing the uncertainty after taking empirical data into account.
This so called *posterior distribution *thus shows not only
which parameter values are likely, but also the degree of uncertainty
still remaining. In order to estimate this distribution, however, it is
necessary to start with a *prior distribution *(characterizing the uncertainty *before *taking
empirical data into account) in addition to the data. The resulting
posterior distribution can be viewed as an updated version of this
characterization taking the new data into consideration.

In addition to measurement error, Bayesian analysis can be used to evaluate uncertainty associated with any unknown values in the model, including values of parameters at various hierarchical levels in the model, as well as missing data, once the prior distributions are specified. In practice, prior distributions are generally used which restrict values as little as possible (so called vague or uninformative prior distributions), so as to give the greatest weight possible to the data. For instance, Yoshida et al. [25] analyzed the results of perception experiments with Finnish and Japanese listeners using a Bayesian hierarchical logistic model with “vague priors that allowed a very broad range of values relative to the scale of the data, whereby the prior has minimal influence on the posterior distribution”. If the data being taken into account is at all relevant to the model in question, the posterior distribution is guaranteed to be more exact (“narrower”, more informative) than the prior distribution. An excellent general reference for probability theory in a Bayesian setting is [9]. A recent exposition of Bayesian methods in phonetics can be found in [24].

Kruschke [12] is an excellent guide to carrying out Bayesian analysis in practice. Typically a mathematically precise determination of the required posterior distribution is intractable, except in very simple special cases. Instead, the posterior distribution is represented by a large sample of values from that distribution. Any information of interest about the posterior distribution (probability that some parameter is greater than zero, probability that one parameter is greater than another, etc.) can be estimated from this sample. Computationally there are efficient methods (notably so called Markov Chain Monte Carlo or MCMC) for generating the sample. The size of the generated sample, and thus the accuracy of the representation, is limited only by the computation time used and the amount of storage space available.

- STOCHASTIC DYNAMICAL MODELS

In a deterministic dynamical system the rule associated with each state gives a unique result, which in turn means that if the system is in a particular state at a particular time, the entire future development will always be the same. From the point of view of the modeler, there may be uncertainty concerning the exact nature of the rules of change of a dynamical system, or concerning the exact state the system is in. Thus a statistical analysis is often appropriate, even in the case of a deterministic model.

Some models describe average behaviour in a system with so many
interacting individuals that it is not feasible to follow the
development of each individual. Instead the total population of
individual behaviors is modeled using statistical tools. This strategy
is used, for instance in physics to model the behavior of gases (*thermodynamics*),
and in biology to model changes in animal populations. It is a possible
strategy regardless of whether or not individual behaviour is
considered to follow deterministic rules.

In phonetics, exemplar-theoretic models [cf. eg. 18, 10] apply this
idea to explain how individual speakers can exhibit variable behavior
apparently governed by probability distributions. The question has been
raised, how can individuals store and manipulate such abstract
distributions? Exemplar models are based on the idea that an individual
accumulates a large collection of examples or *exemplars*, which
can approximate a distrubtion (similar to the use of a large sample in
Bayesian MCMC methods when exact probability distributions are
intractable). Of course these exemplars (or traces of past experiences)
are abstractions from the original concrete events, but it is assumed
that they are rich in contextual information and associations. Therefore
any statistical trend in the previous experience of the individual is
automatically reflected in subsequent behavior, which is guided by the
collection of previously accumulated exemplars. Among other effects,
various phonetic categories emerge in the learner from relatively dense
clouds of exemplars [for a recent example of this type of analysis, cf.
22]. Accumulated exemplars of past communicative experience are
intimately connected to the communicative network an individual is
emersed in, and in sociophonetics exemplar theory is readily extended
from individuals to entire populations of speakers and their social
interactions [cf. eg. the articles in 8].

A deeper form of stochasticity arises in dynamical models that contain inherent uncertainty regarding the future development from any state. In such a system the exact trajectory starting from a given state would vary from one instantiation to the next, even if all relevant parameters were known exactly. Whereas dynamical systems are often modeled with the help of differential equations, such stochastic systems may be modeled using stochastic differential equations. In some cases actual individual trajectories in such models can be thought of as belonging to an inifinite dimensional probability distribution, with each point in time representing a separate dimension (eg. in Gaussian process theory).

In the usual case of a continuous state space dynamical system, the number of states will be infinite, and the rule of change is a vector determining the direction and magnitude of change at each state. With a stochastic rule of change the system develops like a (drunken) walker that takes steps in random directions (and such systems are sometimes referred to as random walks), but more often in some directions (“downhill”) than others (“uphill”). Associated with state space then is a “landscape” of sorts that influences the direction of change without determining it completely.

In more complex models it is even possible for the landscape itself
to exhibit dynamical behaviour, in other words for past behavior to
influence the form of the landscape in the future. An early example of
such a model is Grassé’s theory of *stigmergy *[6], first used to model insect path building behavior. Lam has called such models *active walker models *[13,
14] in opposition to other random walk models with an immutable
landscape. The essential idea is that although the landscape influences
action, so that some actions (or routes) are more attractive than
others, past actions also change the landscape (locally), increasing the
attractiveness of those actions in the future. Applied to language
typology, such models could offer an explanation as to how it is
possible for some languages to exhibit relative stability of nonoptimal
phonetic patterns (such as large V.O.T. values for stops in a language
where voicing and aspiration do not distinguish meaning). In addition to
ease of articulation and ease of perception, phonetic behavior can be
seen as shaped by ease of repetition—just as a naturally evolved path
from A to B does not necessarily represent the theoretically optimal
route between them.

- MORE MODELS ...

Obviously this is only the merest beginning of an exposition of types and features of mathematical models useful in phonetics, and an exhaustive list is certainly not even possible. It is hoped, however, that this small glimpse into mathematical descriptive tools will encourage the reader to think abstractly about concrete speech in order to discover new instances of well understood mathematical patterns, and perhaps also to discover new patterns in speech which call for new or refined mathematical models to aid in understanding.

REFERENCES

[1] Ralph Abraham and Christopher D. Shaw. *Dynamics: The Geometry of Behavior*. 2nd ed. Addison-Wesley, 1992.

[2] Plínio A. Barbosa. “From Syntax to Acoustic Duration: A Dynamical Model of Speech Rhythm Production”. In: *Speech Communication *49 (2007), pp. 725–742.

[3] Plínio A. Barbosa and Sandra Madureira. “Toward a Hierarchical Model of Rhythm Production: Evidence from Phrase Stress Domains in Brazilian Portuguese”. In: [17], pp. 297–300.

[4] Catherine P. Browman and Louis Goldstein. “Gestural
Specification using Dynamically-Defined Articulatory Structures”. In: *Journal of Phonetics *18 (1990), pp. 299–320.

[5] Catherine P. Browman and Louis Goldstein. “Articulatory Phonology: An Overview”. In: *Phonetica *3-4 (1992), pp. 155–180.

[6] Pierre-Paul Grassé. “La Reconstruction du nid et les
Coordinations Inter-Individuelles chez Bellicositermes Natalensis et
Cubitermes sp. La théorie de la Stigmergie: Essai d’interprétation du
Comportement des Termites Constructeurs”. In: *Insectes Sociaux *6 (1959), pp. 41–84.

[7] Sarah Hawkins. “An Introduction to Task Dynamics”. In: *Papers in Laboratory Phonology II: Gesture, Segment, Prosody*. Cambridge University Press, 1992, pp. 9–25.

[8] Stefanie Jannedy and Jennifer Hay, eds. *Journal of Phonetics *4 (2006): *Modelling Sociophonetic Variation*. (special issue).

[9] Edwin T. Jaynes. *Probability Theory: The Logic of Science*. 2003. URL: https://bayes.wustl.edu/etj/prob/book.pdf.

[10] Keith Johnson. “Resonance in an Exemplar-Based Lexicon: The Emergence of Social Identity and Phonology”. In: *Journal of Phonetics *4 (2006), pp. 485–499.

[11] J. A. Scott Kelso. *Dynamic Patterns: The Self Organization of Brain and Behavior*. MIT Press, 1995.

[12] John K. Kruschke. *Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan*. 2nd ed. Waltham, MA: Academic Press, 2015.

[13] Lui Lam. “Active Walks: The First Twelve Years (Part I)”. In: *International Journal of Bifurcation and Chaos *8 (2005), pp. 2317–2348.

[14] Lui Lam. “Active Walks: The First Twelve Years (Part II)”. In: *International Journal of Bifurcation and Chaos *2 (2006), pp. 239–268.

[15] Michael O’Dell and Tommi Nieminen. “Coupled Oscillator Model of Speech Rhythm”. In: [17], 1075–1078.

[16] Michael O’Dell and Tommi Nieminen. “Coupled Oscillator Model for Speech Timing: Overview and Examples”. In: *Nordic Prosody: Proceedings of the Xth Conference, Helsinki 2008*. Ed. by Martti Vainio, Reijo Aulanko, and Olli Aaltonen. Peter Lang, 2009, pp. 179–189.

[17] J. Ohala, Y. Hasegawa, M. Ohala, D. Granville, and A. Bailey, eds. *Proceedings of the XIVth International Congress of Phonetic Sciences*. University of California, Berkeley. 1999.

[18] Janet B. Pierrehumbert. “Exemplar Dynamics: Word Frequency, Lenition and Contrast”. In: *Frequency and the Emergence of Linguistic Structure*. Ed. by Joan Bybee and Paul Hopper. Amsterdam: John Benjamins, 2001, pp. 137–157.

[19] Arkady Pikovsky, Michael Rosenblum, and Jürgen Kurths. *Synchronization: A Universal Concept in Nonlinear Sciences*. Cambridge University Press, 2001.

[20] Mikhail I. Rabinovich, Ramón Huerta, Pablo Varona, and
Valentin S. Afraimovich. “Transient Cognitive Dynamics, Metastability,
and Decision Making”. In: *PLoS Computational Biology *5 (2008), e1000072.

[21] Elliot L. Saltzman and Kevin G. Munhall. “A Dynamical Approach to Gestural Patterning in Speech Production”. In: *Ecological Psychology *4 (1989), pp. 333–382.

[22] Antje Schweitzer. “Exemplar-Theoretic Integration of
Phonetics and Phonology: Detecting Prominence Categories in Phonetic
Space”. In: *Journal of Phonetics *100915 (2019), pp. 1–20.

[23] Juraj Šimko and Fred Cummins. “Sequencing and Optimization within an Embodied Task Dynamic Model”. In: *Cognitive Science *3 (2011), pp. 527–562.

[24] Shravan Vasishth, Bruno Nicenboim, Mary E. Beckman,
Fangfang Li, and Eun Jong Kong. “Bayesian Data Analysis in the Phonetic
Sciences: A Tutorial Introduction”. In: *Journal of Phonetics *71 (2018), pp. 147–161.

[25] Kenji Yoshida, Kenneth J. de Jong, John K. Kruschke, and
Pia-Maria Päiviö. “Cross-language Similarity and Difference in Quantity
Categorization of Finnish and Japanese”. In: *Journal of Phonetics *50 (2015), pp. 81–98.