Return to list
Conversation Analysis
Vered Silber-Varod | The Tel Aviv University Center for AI and Data Science

Conversation Analysis (CA) is a research approach that focuses on the structure of spoken interactions, examining turn-taking patterns and the organization of conversation. It differs from discourse analysis by emphasizing the structure of the conversation rather than its content. CA employs terminology to define spoken language units and aims to understand the context and relationships between participants, and one of its goals is to assign patterns of turn takings to specific genre of conversation. It is widely used in the study of human social interaction and has implications for the development of conversational agents (Clark et al., 2019). CA scholars use a certain terminology to define the spoken language units so that they will be able to answer questions such as “what are the cues of turn completion?” or in a multiparty conversation “can we predict who speaks next?”. Following such definitions, one can start delineating the rules of the conversation. As Sacks, Schegloff, & Jefferson (1974) put it, turn-taking in conversation is “locally managed, party-administered, interactionally controlled.” As such, conversation rules are never fulfilled in full, in that sense, they are not “road signs”, but serves as guidelines that speakers tend to follow in order to communicate properly.

Conversation as the most basic instance of language use is therefore traditionally situated in the relationship between the structure of the language and the structure of the conversation. Further, as conversation is produced by speakers (human or artificial), who belong to different socialization agents, the field of CA is the dominant approach to the study of human social interaction across the disciplines of Sociology, Communication, and Linguistics (Stivers & Sidnell, 2013; Sidnell, 2016). The detailed linguistic information, the grammatical formatting, is revealed only as the turn-at-talk unfolds and is what helps the recipient understand the message (Mushin & Doehler, 2021). Unlike written chats, in spoken interaction another set of “front-loaded” information is encoded by the recipient, these are: prosody (Selting, 2010; Bergmann, 2018), gaze, and turn-initial tokens (Kendrick, Holler, & Levinson, 2023; Kurata et al., 2023) that are encoded earlier and helps the recipient foresee the intentions of the speaker (Levinson, 1983).

Sacks, Schegloff, and Jefferson (1974) defined five elements to frame CA:

  1. Conversation is not a task to fulfil. In its basic sense, it does not have practical implications. It is primarily a social activity.
  2. Power relations in conversation are dynamic and expose the speakers’ self-positioning and less their status or static roles.
  3. A conversation is an interaction with only few participants.
  4. Turn-taking in conversation is “locally managed, party-administered, interactionally controlled.” The consequence of this is that turns are short sequence of speech.
  5. The conversation is intended for its participants and not for listeners outside it.

If one of those elements is absent or changed, then a fundamental deviation in the organization of the talk occurs.

CA examines several key elements in spoken interactions, including the following concepts that are fundamental to understanding the structure and dynamics of spoken interactions in CA:

  1. Turn-taking:

Turn-taking refers to the way speakers take turns at talking in a coordinated manner during a conversation. It involves the orderly exchange of speaking turns between participants, with one speaker yielding the floor to another to maintain the flow of conversation (Sinclair & Coulthard, 1975, 1992; White, 2003).

  1. Repair and correction mechanisms:

Repair mechanisms in spoken language refer to the ways in which speakers are correcting misunderstandings or errors in conversation. These mechanisms are not typically found in written language and are essential for maintaining the coherence and effectiveness of spoken interactions. Repair can occur in the form of self-repairs, where a speaker corrects their own speech, or repairs of others, where a participant assists in correcting another speaker's utterance. In the context of CA, repairs are seen as collaborative efforts to ensure effective communication, rather than interruptions. Self-repairs often involve disfluencies that signal the speaker's difficulty in formulating their message, while repairs of others demonstrate a cooperative approach to resolving communication breakdowns.

  1. Adjacency pairs and Sequence organization:

In the context of CA, an "adjacency pair" refers to a specific type of conversational structure or sequence of utterances that consists of two related turns, typically spoken by different participants in a conversation (Schegloff & Sacks 1973). These two turns are closely linked and are often characterized by a predictable pattern of interaction. The first turn in an adjacency pair, known as the "first pair part," typically sets up or initiates a particular type of speech act or action. The second turn, the "second pair part," is a response to the first pair part and is expected to be related in content or function. Common examples of adjacency pairs include opening pairs, question-answer pairs, offer-acceptance pairs, and complaint-apology pairs. Adjacency pairs are essential for understanding how conversational partners coordinate their communication, as they provide a structured framework for the exchange of information, requests, and social actions in everyday conversations (Schegloff & Sacks, 1973).

Sequence organization in CA refers to the way in which turns at talk are ordered and combined to make actions take place. It involves the systematic principles that govern the order and combination of turns at talk. This includes the organization of adjacency pairs, preference structures, and the role of sequence organization in repair and topic management (Schegloff, 2007; Stivers, 2013).

  1. Preference organization:

Preference organization is a fundamental aspect of understanding the social dynamics and decision-making processes that occur within spoken interactions. Speakers typically prefer to produce the second part of an adjacency pair that matches the first part (e.g., to answer a question rather than refuse it). This is called “preference for confirmation” (Heritage & Watson, 1979; Stivers, Sidnell, & Bergen, 2018).

  1. Topic management:

Topic management in CA refers to the ways in which speakers introduce, maintain, and transition between topics in conversation. It involves the study of how participants collaboratively negotiate and shift the focus of discussion, as well as how they signal their understanding of the current topic. This aspect of CA provides insights into the dynamics of topic initiation, development, and closure in spoken interactions.

  1. Speech acts and pragmatics:

Speech acts and pragmatics are essential components of CA. Speech act theory, as proposed by J. L. Austin and further developed by J. R. Searle, focuses on the performative function of language, emphasizing that utterances not only convey information but also perform actions. Pragmatics, on the other hand, examines how talk is situated within the socio-cognitive worlds of participants (Haugh, 2012), including situational, cultural, and linguistic factors that influence the interpretation of utterances. It also encompasses the study of indexicality, where words or expressions derive meaning from context, and the role of prosody and paralinguistic information in conveying meaning in conversation (Levinson, 1983).

All these elements are crucial for understanding the structure and dynamics of human social interaction in spoken conversations.

The recordings of naturally occurring interaction are therefore the primary data of CA scholars and transcripts are designed to make this primary data available for intensive analytic consideration, providing a basis for detailed analysis of turn-taking, repair, preference organization, and other key elements of spoken interactions (Seedhouse, 2005). Transcription plays a crucial role in CA as it provides the means to systematically capture and analyze spoken interactions. It involves the detailed representation of spoken language, including speech sounds, pauses, prosodic information, overlapping speech, and other features of talk. It allows researchers to capture, document and share their insights on the rich and complex nature of spoken interactions.

However, a crucial pivot for understanding the nuanced aspects of human communication is the significance of context and indexicality in conversation. Context is not fully represented in transcripts. While transcripts capture the spoken words and some non-verbal elements, they may not fully convey the situational, cultural, and linguistic context in which the conversation takes place. This can limit the understanding of the full context in which the interaction occurs, including the physical environment, participants' backgrounds, and other contextual factors that influence the interpretation of utterances. Context provides the situational, cultural, and linguistic backdrop against which utterances are interpreted, influencing the meaning and implications of speech acts.

Indexicality is the phenomenon whereby some linguistic expressions are systematically dependent on the context for their interpretation. Deictic expressions such as “that”, “here”, "now”, and indexical markers (e.g., "I," "you," "his") serve as pointing towards an object that was mentioned previously in the conversation or an entity that is known to the participants through the context outside the utterance itself (Levinson, 1983). Indexing may be produced by a mere gesture, like pointing a finger at something or other bodily practices (Auer & Stukenbrock, 2022).

In recent years, the field of CA has become the academic infrastructure to the emerging technologies of conversational agents that are nowadays ubiquitous (Xu, 2023). CA knowledge has been applied in various settings, including clinical, educational, and digital communication. In clinical settings, CA has been used in Voice-Assistants (VA)-supported therapy (Siegert et al., 2023). In educational settings, CA has been applied to understand the dynamics of learning in science, technology, engineering, and mathematics (STEM) educational contexts, as well as to explore the interactions between participants and their impact on learning. In digital communication, CA has been used to study the political interview as an institutional interactional setting (Goodwin & Heritage, 1990) and to analyze text-based service conversations in chat box and social network applications.

Conversation AI is often mentioned to highlight the emerging concept of Conversation Intelligence (CI) and its significance in understanding human-human communication (Silber-Varod, 2018). Conversation AI refers to algorithms that model interactions (Reichl & Hammer, 2004) and are used to develop computer-mediated communication applications, such as chatbots or virtual agents (Xu, 2023) that use natural language processing and machine learning to imitate human interactions. CI technologies play a role in the accumulation of data during conversations, including vocal features and body gestures, which are then analyzed to gain insights into communication patterns and structures (Pekarek Doehler, Keevallik, & Li, 2022). CI technology has been adopted by businesses to improve sales and marketing performance, using parameters such as talking-listening ratios and turn-taking ratios. However, the applied field of conversation agents is still under development and further research is needed to establish consistency among studies, as well as other challenges such as the performance of automatic speech recognition systems and interruptions in dialogue flow still exist.

Other fields this entry is related to are Interactive Communication Management (ICM) (Allwood, 2001; 2008), conversational systems, speech communication, Interactional Linguistics (Couper-Kuhlen & Selting, 2018), positioning and power relations (Rienks & Heylen, 2005), and speech analytics (Scheidt, & Chung, 2019).


Allwood, J. (2001). The structure of dialog. In M. M. Taylor, D. G. Bouwhuis, & F. Néel (Eds.), The structure of multimodal dialogue II (pp. 3-24). Amsterdam, Netherlands: John Benjamins.

Allwood, J. (2008). Dimensions of embodied Communication - towards a typology of embodied communication. In I. Wachsmuth, M. Lenzen, & G. Knoblich (Eds.), Embodied communication in humans and machines (pp. 1-24). Oxford, UK: Oxford University Press.

Auer, P., & Stukenbrock, A. (2022). Deictic reference in space. In: Jucker, A. H., & Hausendorf, H. (Eds.). Pragmatics of Space (Vol. 14) (pp. 23-61). Walter de Gruyter GmbH & Co KG.

Bergmann, P. (2018). Prosody in interaction. In: M. Heinz, & M. C. Moroni (Eds.). Prosody: Information Structure, Grammar, Interaction, Special issue of Linguistik Online. DOI:10.13092/lo.88.4188

Clark, L., et al. (2019). What makes a good conversation? Challenges in designing truly conversational agents. In Proceedings of the 2019 CHI conference on human factors in computing systems (pp. 1-12).

Couper-Kuhlen, E. & Selting M. (2018). Interactional Linguistics: Studying Language in Social Interaction. Cambridge: Cambridge University Press.

Goodwin, C. and Heritage, J. (1990). Conversation Analysis. Annual Review of Anthropology. 19(1), 283-307.

Haugh, M. (2012). Conversational interaction. In K. Allan & K. Jaszczolt (Eds.), The Cambridge Handbook of Pragmatics (Cambridge Handbooks in Language and Linguistics, pp. 251-274). Cambridge: Cambridge University Press.

Heritage, J., & Watson D. R. (1979). Formulations as conversational objects. In: Psathas G. (ed.) Everyday Language: Studies in Ethnomethodology. New York: Irvington Publishers, pp. 123–162.

Kendrick, K. H., Holler, J., & Levinson, S. C. (2023). Turn-taking in human face-to-face interaction is multimodal: gaze direction and manual gestures aid the coordination of turn transitions. Philosophical Transactions of the Royal Society B, 378(1875), 20210473.

Kurata, F., Saeki, M., Fujie, S., & Matsuyama, Y. (2023). Multimodal Turn-Taking Model Using Visual Cues for End-of-Utterance Prediction in Spoken Dialogue Systems. Proc. INTERSPEECH 2023 (pp. 2658-2662), DOI: 10.21437/Interspeech.2023-578

Levinson, S. C. (1983). Pragmatics. Cambridge Textbooks in Linguistics. Cambridge University Press.

Mushin, I., & Doehler, S. P. (2021). Linguistic structures in social interaction: Moving temporality to the forefront of a science of language. Interactional Linguistics, 1(1), 2-32.

Pekarek Doehler, S., Keevallik, L., & Li, X. (2022). The grammar-body interface in social interaction. Frontiers in Psychology, 13, 875696.

Reichl, P., & Hammer, F. (2004) Hot discussion or frosty dialogue? towards a temperature metric for conversational interactivity. Proc. Interspeech 2004 (pp. 317-320), DOI: 10.21437/Interspeech.2004-147

Rienks, R., & Heylen, D. (2005). Dominance detection in meetings using easily obtainable features. Proceedings of the International Workshop on Machine Learning for Multimodal Interaction (pp. 76-86). Berlin, Heidelberg: Springer.

Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn taking for conversation. In Studies in the organization of conversational interaction (pp. 7-55). Academic Press.

Schegloff, E. A., & Sacks, H. (1973). Opening up Closings. Semiotica 8: 289-327.

Scheidt, S., & Chung, Q. B. (2019). Making a case for speech analytics to improve customer service quality: Vision, implementation, and evaluation. International Journal of Information Management, 45, 223-232.

Seedhouse, P. (2005). Conversation Analysis and language learning. Language Teaching, 38(4), 165-187. doi:10.1017/S0261444805003010

Selting, M. (2010). Prosody in interaction: State of the art. In: D. Barth-Weingarten, E. Reber, & M. Selting (eds.), Prosody in interaction. Amsterdam, Benjamins: 3–40.

Schegloff, E. (2007). Sequence Organization in Interaction. Cambridge: Cambridge University Press.

Sidnell, J. (2016). Conversation Analysis. Oxford Research Encyclopedia of Linguistics. Retrieved 3 Oct. 2023, from

Siegert, I., Busch, M., Metzner, S., & Krüger, J. (2023). Voice Assistants for Therapeutic Support – A Literature Review. In: Salvendy, G., Wei, J. (eds) Design, Operation and Evaluation of Mobile Communications. HCII 2023. Lecture Notes in Computer Science, vol 14052. Springer, Cham.

Silber-Varod, V. (2018). Is human-human spoken interaction manageable? The emergence of the concept Conversation Intelligence. Online Journal of Applied Knowledge Management (OJAKM). A Publication of the International Institute for Applied Knowledge Management, 6(1), 1-14.

Sinclair, J. & Coulthard, M. (1975). Toward an Analysis of Discourse: the English Used by Teachers and Pupils. Oxford University Press.

Sinclair, J. & Coulthard, M. (1992). Toward an analysis of discourse. In Coulthard, M. (ed.), Advances in Spoken Discourse Analysis. 1-34. Routledge.

Stivers, T., Sidnell, J., & Bergen, C. (2018). Children's responses to questions in peer interaction: A window into the ontogenesis of interactional competence. Journal of Pragmatics, 124, 14-30.

Stivers, T. (2013). Sequence Organization. In: Sidnell, J. & Stivers, T. (eds.) (2013). The Handbook of Conversation Analysis (pp. 191-209). Malden, MA: Wiley-Blackwell.

Stivers, T., & Sidnell, J. (2013). Introduction. In: Sidnell, J. & Stivers, T. (eds.) The Handbook of Conversation Analysis (pp. 1-8). Malden, MA: Wiley-Blackwell.

White, A. (2003). The application of Sinclair and Coulthard’s IRF structure to a classroom lesson: Analysis and discussion. University of Birmigham, 1-17.

Xu, Y. (2023). Talking with machines: Can conversational technologies serve as children's social partners?. Child Development Perspectives, 17(1), 53-58.