L2 prosody acquisition
Dorothy M. Chun | University of California, Berkeley

This entry first provides some basic terminology and definitions for L2 Prosody Acquisition, including key concepts, such as L1-L2 transfer and perception vs. production. Then, the rationale for why prosody is important for L2 learners is discussed, along with the possible goals of learning/acquiring L2 prosody. This is followed by some prevalent theories of how L2 prosody is acquired and, finally, by a summary of the research on the effectiveness of pronunciation training for learning L2 prosody. [The LBASS entry for L2 Prosody by Silva can be found at]


The terms prosody and prosodic features are often used interchangeably with suprasegmentals, and these suprasegmental features can be measured acoustically in terms of F0, intensity and duration. In auditory terms, features of a word can be described by how individual syllables are accented (or not). Typical means of accenting or stressing a syllable include a change in pitch (rising or falling), louder or softer amplitude (volume), longer or shorter duration (length). For an overview of L2 word prosody, see Jongman & Tremblay (2020). At the syllable and word levels, contingent sounds can be elided or omitted entirely (e.g., Watcha doin’?), and pausing between words or phrases can demarcate tone groups and intonation groups, while also influencing stress and rhythm. At the sentence level, sentence prosody typically refers to intonation, stress, rhythm, and voice quality, in other words, the features that characterize the pitch patterns, stress and rhythm patterns, and pauses that carry meaning or mark information structure (e.g., focus, emphasis, new information, old information, statement, question, exclamation). In addition, one of the functions of prosody is to express attitudes and emotions. For an overview of L2 sentence prosody, see Trouvain & Braun (2020). At the discourse level, prosody functions in discourse intonation to signal such meanings as new topics, switching topics, presuppositions, assumptions, keeping the floor, taking the floor or yielding the floor in conversations, ending a conversation, etc. (see Chun, 2002). 

In second language acquisition (SLA), L1-L2 transfer is a common occurrence in that speakers will use features of their L1 when speaking an L2, and this holds as well for prosody. To date, the majority of studies on transfer involve learners of L2 English, and transfer of different specific aspects of prosody have been investigated. For example, in their study of the transfer of tonal alignment in English pitch accent by L1 speakers of Mandarin Chinese, Lu and Kim (2016) found that while Mandarin, as a tonal language, uses pitch mainly to signal lexical contrasts, English uses pitch to convey discourse and pragmatic meaning. In a study of L1 Greek learners of L2 English, Kainada and Lengeris (2015) found that the learners transferred the full set of L1 Greek tonal events (in terms of tonal alignment, speech rate, pitch span and pitch level) when speaking L2 English, associating them with stressed syllables. Research on transfer effects in target L2s other than English is steadily increasing, e.g., Rasier and Hiligsmann (2007), who studied L1 French speakers learning L2 Dutch, and discovered that they tended to overuse deaccentuation of given information when speaking Dutch, a language that does not deaccent. For an excellent summary of the transfer effects of different prosodic components for different pairs of L1s and L2s, see Trouvain and Braun (2020).

In terms of phonetic and phonological acquisition of an L2, there is debate as to whether L2 learners must be able to perceive L2 sounds and prosody first before being able to produce these sounds and prosody that differ from their L1, and whether (and how) training can facilitate or improve perception and production..

Importance of prosody for L2 learners

Due to the fact that prosody has so many functions at the word, sentence, and discourse levels, it is an important, albeit often overlooked facet of L2 learning. One of the main reasons that it has been understudied and under-taught is the sheer complexity of prosodic systems in all languages. Linguists have investigated prosody for decades (see Gussenhoven & Chen, 2020), but applied linguists and SLA researchers and teachers are often not as familiar with prosody, much less know how to teach it.

What then are some realistic goals for L2 speech? Is it to sound as native-like (or accent-free) as possible? Or is it to acquire prosodic patterns that allow for intelligible and comprehensible speech? The nativeness vs. intelligibility question has been long debated (Levis, 2005), but researchers and teachers alike are converging on a consensus, with a gradual yet distinct trajectory toward viewing intelligibility as both achievable, but more significantly, as necessary and more important than nativeness for effective communication (Levis, 2020). In addition, research has shown that prosody is at least as important, if not more important, than segmentals for producing comprehensible and intelligible speech (Derwing et al., 1998; Kang et al., 2010).

The task then is to understand how L2 prosody is acquired and/or how it can be taught. It is essential to focus both on perception and production, and furthermore, in authentic communicative situations. Learners must be trained to perceive prosodic markings that signal meaning in authentic speech and must have opportunities to practice and produce discourse-level utterances, not just sentence-level prosodic patterns.

Theories of acquiring/learning L2 prosody

A substantial amount of research on L2 phonology has been done in a comparative manner, comparing the phonological systems (primarily segmentals) of L1 and L2. Similarly, theories of L2 phonological learning have also historically confined themselves to the segmentals. For example, structuralists such as Lado (1957) proposed the Contrastive Analysis Hypothesis (CAH), where comparison of the linguistic systems of two languages, in this case, the phonological systems, could predict the sounds and phonemes that L2 learners would find difficult. Two theories of L2 learning that confined themselves to the phonetic and phonological realms are Best’s (1995) Perceptual Assimilation Model (PAM) and Flege’s (1995) Speech Learning Model (SLM). The PAM theorizes that adults have difficulty in mastering L2 phonetics due to their inability to perceive new or unknown sounds. They can only perceive unfamiliar sounds in terms of how similar or dissimilar they are to their native phonemes. The SLM on the other hand, considers L2 phonetic learning over time and proposes that the level of accuracy at which an L2 learner can perceive L2 vowels and consonants has a direct effect on the level of accuracy at which the learner can produce the segments. 

It was not until 2015 that a proposal for a theory of learning L2 prosody appeared (Mennen, 2015). Her  working model of L2 intonation learning considers predictions from both the PAM and the SLM and suggests how cross-language differences in intonation can predict where L2 prosodic deviations are likely to occur. Among the numerous prosodic features that might deviate from the L1 in L2 speech are, for example, pitch range (Mennen et al., 2014), pitch patterns in tonal vs. non tonal languages (Yang, 2013), and rhythm (Li & Post, 2014). In addition, differences in auditory processing (Sun et al., 2021) and the effect of age on the acquisition of L2 prosody have been studied (Huang & Jun, 2011). 

Effectiveness of pronunciation training for acquisition/learning

In order to help L2 learners acquire L2 prosody, different types of instruction that target awareness, perception, and production of different prosodic features have been implemented (Chun & Jiang, 2022; Chun & Levis, 2020). The effectiveness of instruction has been extensively studied, and meta analyses of these studies have shown an overall positive effect. These analyses have covered a variety of issues: effectiveness in general of improving both individual sounds and prosodic features; effectiveness of using technology in computer assisted pronunciation training (CAPT) applications and programs; effectiveness of training for perception, production, or both; effectiveness of training for comprehensibility and intelligibility of L2 speech; assessment of pronunciation with subjective, human ratings vs. computer-based acoustic analyses.

Starting in the 2000s, pronunciation training often included both segmentals and suprasegmentals, and it is important to understand that not all pronunciation features are equally important when it comes to intelligibility, as Saito’s (2012) synthesis of 15 studies revealed. In a meta analysis of 86 studies, Lee et al. (2015) reported medium to large effects of L2 pronunciation instruction, and also that when instruction targets both segmentals and suprasegmentals, there are larger effects for improvement than when either is targeted independently. In terms of listening, a meta analysis on the effects of instruction on prosodic listening skills found “a large, positive effect on the ability of learners to develop phonological categories, focusing on learners’ skill in integrating suprasegmental features into listening comprehension” (McAndrews, 2019, p. 151). A meta analysis of 20 CAPT studies (Mahdi & Al Khateeb, 2019) found a medium effect size for studies of segmental features (0.47), as compared to a large effect size for studies of suprasegmental features (0.89). Studies that investigated both features showed a medium effect size (0.72).

Some studies have shown that L2 learners can be taught both to perceive prosodic patterns in L2 as well as improve their L2 production. Many researchers and teachers believe that perception and production go hand in hand, and that when learners are trained first to identify a feature such as contrastive focus, their production performance also improves (Hirata, 2004; Muller Levis & Levis, 2012). Figure 1 shows an example of how providing visual feedback with pitch contours of sentence focus (highlighted in yellow) in English can be used to train L2 learners (Jiang & Chun, 2021).


Figure 1. Visual feedback provided by Praat (waveform, pitch contours and transcription from top to bottom).

However, it should be acknowledged that the great majority of studies of pronunciation instruction have been conducted with word and sentence reading tasks but not on spontaneous speech, and the efficacy of instruction is unclear when performance of authentic conversational speech is measured by subjective, human raters. Future research should focus on the acquisition of discourse level L2 prosody.


