Return to list
Stress group and accent phrase
Philippe Martin | LLF, UFRL, Université de Paris


We don’t read a text word by word, but group of words by group of words (Passy, 1891). Indeed, segmenting the flow of speech word by word, something that could be transcribed with a final dot after each word: Among. Other. Aspects. This. Would. Be. Highly. Inefficient. And. Slow. Down. Our. Speed. Of. Reading. Instead, we group words in chunks in our reading and speaking processes, to achieve a better fluency. The same observation applies to spontaneous speech, either oral or silent.

These groups of words have some interesting properties. Firstly, they include one and only one so-called content word, belonging to the class of verbs, adverbs, nouns or adjectives, associated to some grammatical words, prepositions, pronouns, conjunctions. Interestingly, grammatical words belong to closed classes for which it is difficult or impossible to include new items, whereas content words belong to open classes, in which new words can be easily introduced in the course of language changes.

Secondly, content words bear a stressed syllable making that syllable more perceptively and acoustically prominent, essentially thanks to the characteristics of the vowel of the stressed syllable. This stressed syllable is located somewhere in the content word. In Romance languages (except French) stressed syllables typically mark the boundary of an internal morphological structure of the word (as in Portuguese república for example). The presence, and therefore the realization of that stressed syllable is part of the content word property, and does not result from a speaker choice, contrary to so-called emphatic stress resulting from a speaker decision. The content word stressed syllable is a property of morphology, whereas emphatic stress, which can hit also a grammatical word, results from a reader or speaker choice.

Accent Phrases

From these two properties, we can conclude that the groups of words resulting from reading a text orally or silently, contain a unique content word, and therefore a unique stressed syllable, again excluding emphatic stress. Bearing a unique stressed syllable (actually vowel), the groups of words resulting from speech segmentation in chunks are called accent phrases (AP). Accent phrases results from the segmentation of text when we read, either silently or orally, but also when we speak spontaneously, silently (internal monologue) or orally. This unique accent phrase stress is called a pitch accent in the autosegmental-metrical model.

These considerations are valid for the so-called lexically stressed languages, such as English or Portuguese. The term lexically-stressed refers to the mandatory stressed syllable hitting a priori any content word. However, there are some non-lexically stressed languages such as French and Korean. Being non-lexically stressed does not mean that there is no mandatory stress and no accent phrases, it just indicates that the position of syllabic stress is not determined by the lexicon. Therefore, for these languages, accent phrase stressed syllable must be independent from any word property, leaving the only possibility to be located on the first or the last syllable of accent phrases. 

It’s about time

What would limit the size of accent phrases if there is no other rule than to be positioned on their first or the last syllable? The answer is timing, i.e. the time it takes to pronounce or read an accent phrase, orally or silently. Of course, the duration of oral pronunciation of an accent phrase will depend on the number of words it contains, and more precisely to its number of syllables, as well as the structure of these syllables. For instance, we can expect that an accent phrase with only one syllable will take less time to say aloud than an accent phrase with five syllables or more. For English or Portuguese, we are constrained by lexical stress rules, forcing to stress a syllable in all content words, which limits accent phrases size to the size of these content words, eventually augmented by some grammatical words. For French or Korean however, there is no limit dictated by morphological rules, the limit is of rhythmic nature, determined by the time it takes to pronounce an accent phrase orally.

The find this limit experimentally, we can say aloud accent phrases containing an increasing number of syllables we discover then that if we exceed some 8 or 9 syllables, which suppose a real fast speech rate (characteristic of today French parole des jeunes “the young people speech”), we have to create an extra syllabic stress somewhere inside the intended accent phrase, since there is no linguistic rule that positions stress in a non-lexically stressed languages other that being final (or initial). A classical example in French is la petite armoire verte “the little green cabinet”, that can be pronounced with only one final stressed syllable, whereas la petite armoire vert-bouteille “the little bottle-green cabinet” requires two stressed syllables.

Alternatively, we can experiment the pronunciation of long rare and technical words built from Latin and Greek roots, such as in French paraskevidékatriaphobie (fear of Friday 13) or inconstitutionalité, or in English antidisestablishmentArianism (opposition to the disestablishment of the Church of England). These words require at least one extra stressed syllable in their pronunciation. At least one additional stressed syllable is required in the pronunciation of these long words, as, for example, in incONstitutionalitÉ or paraskevIdékatriaphobIe in French, and antidisestAblishmentArianism  or paraskevidekatriaphobIa in English. This shows that the long words stress constrain applies to lexically-stressed languages as well.

Remarkably, this limit characterizing accent phrases is not directly linked to the number of syllables, but to the time it takes to pronounce them. This means that the segmentation into accent phrases will depend on the speech rate for French or Korean. A slow speech rate will result into accent phrases with few syllables, so a segmentation tending to a phrasing word by word, eventually reaching a segmentation syllable by syllable. On the contrary, a fast speech rate will group more words in each accent phrase, whether content or grammatical. A typical average speech rate is about 4 to 5 syllables per second, whereas fast speakers found recently in France young speakers’ production will reach some 8 to 9 or even 10 syllables per second. Whatever the speech rate, there is a limit: the time to pronounce a single accent phrase, which cannot exceed some 1250-1350 ms, as experimentally established by Martin (2014, 2018).

Surprising enough, the time it takes to read silently an accent phrase is almost independent from its number of syllables. For most of us, when we read silently, we also hear a little voice “in our head”, that generates a complete virtual oral pronunciation without any actual physical sound, together with accent phrases stressed syllables and their associated melody. It is then possible to evaluate the time it takes to read silently a given text ,and to relate it to the number of accent phrases, which is equal in a lexically stressed language to the number of content words in the text. By timing the silent reading of the text, we can obtain an average reading duration for an accent phrase. The answer will of course depend on our expertise in (fast) reading, but if we don’t skip any word in the process, the fastest silent reading speed will give a minimum duration of about 250 ms per accent phrase, again whatever the accent phrases content. Equivalent values are obtained with non-lexically stressed languages, incorporating the fact that the number of actual accent phrases depend on the reading rate, since more than one content word can be part of a single accent phrase. For instance, la ville de Paris “the city of Paris", would generate one accent final stress in fast silent reading, and two accent phrases in slow silent reading la ville de Paris.

Stress clash

Interestingly, there is also a lower limit to the accent phrase duration. Actually, this limit does not pertain to the accent phrase itself, but to the interval between two consecutive stressed syllables (i.e. not separated by any non-stressed syllables in continuous speech). Numerous papers on the so-called stress clash condition gave us some hints: in order to be realized (and perceived) as stressed, consecutive stressed syllable must be separated by something belonging to the structure of the syllables involved, such as some groups of consonants taking a relative long time to pronounce. Otherwise, the first syllable in the sequence necessarily placed of the last syllable must move somewhere else in the first word, as in the famous examples thirteen men or kangaroo saddle. Pronounces slowly, the stress pattern would be thirteen men and kangaroo saddle, leaving enough gap between consecutive stressed syllables. With a fast speech rate however, the time gap becomes too short and the first stress will move the a reserved position as thirteen men and kangaroo saddle. Still, for lexically stressed languages, the accent phrases remain the same, each containing a content word. A similar example in Portuguese would be café quente “hot coffee” with a stress shift in a fast speech rate reducing the gap between café and quente: caquente, and no stress shift in a slow speech rate ca # quente. For non-lexically stressed languages, there is no reserved position for the first stress which simply disappear, giving possibly birth to a larger accent phrase, as in la ville de Paris.

The intriguing part is that by manipulating the time gap between consecutive stressed syllables, the first syllable keeping the same acoustic characteristics (i.e. vowel and syllabic duration, melodic and intensity values) will cease to be perceived as stressed when the gap falls below a threshold of some 250 ms (Martin, 2018).

Plasticity and eurhythmy

Accent phrases present another interesting property, noticed for a long time by practitioners of language teaching, particularly for French: the plasticity of syllabic duration inside the accent phrase. Experimental data (Martin, 2014) show that the syllabic average duration tend to be shorter in accent phrase with a large number of syllables (up to 9 or 10), and conversely longer when accent phrases contain few syllables. Furthermore, there is also a eurhythmic effect (Wioland, 1985) favoring the realization of successive accent phrases of comparable duration. This is obtained in spontaneous speech by expanding or compressing the accent phrase syllabic average duration, and in read speech, in French, by selecting accent phrases with comparable number of syllables (flexible for non-lexically stressed languages by merging two content words in a single accent phrase). This suggests that speakers in spontaneous speech plan the duration of accent phrases before installing a syntactic pattern and the selected lexical item (Blanche-Benveniste, 2003).

It’s in the brain

The question is: why do we need stressed syllable at the first place? The intuitive answer is of course to process speech and text in small chunks, the accent phrases. Stressed syllables appear as tags synchronizing speech segmentation, either oral or silent, spontaneous our read. Interestingly, these time marks operate in a given range, a minimum of 250 ms and a maximum of 1250-1350 ms (depending on the speakers). This time interval corresponds closely to the delta brain waves oscillations (Martin, 2018), similarly to the synchronizing of syllabic perception operated by theta brain waves, in the range from 100 ms to 250 ms, i.e., 0.8 Hz to 4 Hz (Ghitza, 2011).

The above-mentioned experimental data clearly favor the hypothesis linking accent phrases to delta brain oscillations. Silent reading rate is limited to 4 accent phrase per second, i.e. 250 ms per accent phrase. Likewise, 2 consecutive stressed syllables must be separated by a minimum of 250 ms to be both perceived as stressed. On the other side of the time scale, stressed syllables cannot be separated by more than 1250-1350 ms in continuous speech. If this limit is exceeded, an extra stressed syllable is necessarily produced by speakers, or even by readers in silent reading, where of course there is no actual acoustical event generated! 

Syllables and accent phrases as minimal linguistic units

Accent phrases might also be considered as the minimal morphological units, replacing the word which is obviously an artefact stemming from orthographic conventions. One strong argument comes from the analysis of spontaneous speech, when speakers show so-called disfluency by interrupting a current accent phrase before its end (where the stressed syllable is located in French for instance), and when this interruption is followed either by a replacement of the intended accent phrase by another, or by retaking the intended accent phrase from the start, and almost never by completing the initially intended accent phrase (Blanche-Benveniste, 2003). The process suggests indeed that the accent phrase must be pronounced (and read) as a whole, and cannot or with difficulty be handled by pieces. As containing only one stressed syllable, an accent phrase can only exist and be adequately processed as complete, with its stressed syllable synchronizing its decoding in the reader or speaker lexicon of accent phrase.

Accent phrases also appear as the minimal units organized into a hierarchy to form the sentence prosodic structure. This structure is indicated by boundary tones in the Autosegmental-Metrical framework, and by dependency relations indicated by pitch accent melodic contours in alternate theoretical frameworks (Martin, 2018).


Blanche-Benveniste, Claire (2003) La naissance des syntagmes dans les hésitations et répétitions du parler, in Araoui J.L. ed. Le sens et la mesure. Hommages à Benoît de Cornulier, Paris : Honoré Champion, 40-55.

Ghitza Oded (2011) Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm, Frontiers in Psychol. 2011; 2: 130.

Martin, Philippe (2014) Spontaneous speech corpus data validates prosodic constraints, Proceedings of the 6th conference on speech prosody, Campbell, Gibbon, and Hirst (eds.), 525-529.

Martin, Philippe (2018) Intonation, structure prosodique et ondes cérébrales, London : ISTE.

Passy, Paul (1891) Étude sur les changements phonétiques et leurs caractères généraux, Paris : Firmin-Didot. (Thèse pour le doctorat présentée à la Faculté des lettres de Paris).

Wioland François (1985) Les structures rythmiques du français, Paris : Slatkine-Champion.


Aguilera Marion, Radouane El Yagoubi, Robert Espesser & Corine Astésano (2014) Event-Related Potential investigation of Initial Accent processing in French, Proceedings of the 6th conference on speech prosody, Campbell, Gibbon, and Hirst (eds.), 383-387.

Arnal Luc et Anne-Lise Giraud (2017) Neurophysiologie de la perception de la parole et multisensorialité, in Traité de neurolinguistique, Serge Pinto et Marc Sato éd., Louvain-la-Neuve : De Boeck, 97-108.

Beckman Mary E. (1986) Stress and Non-Stress Accent, Netherlands Phonetic Archives Series. Walter de Gruyter.

Blanche-Benveniste Claire (2000) Approches de la langue parlée en français, Paris : Ophrys, 164 p.

Di Cristo Albert (2016) Les musiques du français parlé, Berlin : De Gruyter Mouton, 513 p.

Fónagy Ivan (1980) L’accent en français : accent probabilitaire, in L’accent en français

contemporain, Studia Phonetica, 15, Ivan Fónagy et Pierre Léon (eds.), Didier, Paris, 123-233.

Garde Paul (2013) L’accent, Paris : Lambert-Lucas, 170 p.

Ghitza Oded and S. Greenberg (2009) On the possible role of brain rhythms in speech perception: intelligibility of time-compressed speech with periodic and aperiodic insertions of silence, Phonetica 66 (1-2), 113-126.

Ghitza Oded, Anne-Lise Giraud & David Poeppel (2013) Neuronal oscillations and speech perception: critical-band temporal envelopes are the essence, Frontiers in Human Neuroscience, January 2013, Volume 6, Article 340.

Gilbert Annie & Victor Boucher (2007) What do listeners attend to in hearing prosodic structures? Investigating the human speech-parser using short-term recall, Proc. Interspeech 2007, 430-433.

Hayes, Bruce (1995). Metrical stress theory. The University of Chicago Press: Chicago.

Jun Sun-Ah & Cécile Fougeron (2002) The Realizations of the Accentual Phrase in French Intonation, Probus 14, 147-172.

Keating Patricia and Stefanie Shattuck-Hufnage (2002) A Prosodic View of Word Form Encoding for Speech Production, UCLA Working Papers in Phonetics, 101: 112-156.

Levey, Sandra & Raphael, Lawrence. (2002). Stress clash: Frequency and strategies of resolution. Journal of The Acoustical Society of America - J ACOUST SOC AMER. 111. 2476-2476.

Martin, James (1972). Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological Review 79(6): 487–509.

Martin Philippe (1975) Analyse phonologique de la phrase française, Linguistics, 146 (Fév. 1975), 35‑68.

Martin Philippe (2015) The Structure of Spoken Language. Intonation in Romance, Cambridge: Cambridge University Press, 206 p.

Martin Randi, Hao Yan and Tatiano Schnur (2014) Working memory and planning during sentence production, Acta Psychologica, 152C,120-132. McConkie George W., Roderick N. Underwood, David Zola & G. S. Wolverton (1985) Some Temporal Characteristics of Processing During Reading. Journal of Experimental Psychology: Human Perception and Performance, 11(2), 168-186.

Post Brechtje (1999) Restructured Phonologic Phrases in French, evidence from clash resolution, Linguistics, 37/1, 1999, 41-63.

Quercia Patrick (2010) Ocular movements and reading: a review, J. Fr. Ophtalmologie, 33 (6): 416-423.

Rossi Mario (1978) Interaction of intensity glides and frequency glissandos, Language and Speech (21) 4, 384-396.

Steinhauer Karsten, Kai Alter & Angela D. Friedrici (1999), Brain potentials indicate immediate use of prosodic cues in natural speech processing, Nature Neuroscience, 2(2) 191-196.