Return to list
Digital Linguistic Biomarker
Gloria Gagliardi | University of Bologna


Natural language processing (NLP) and artificial intelligence (AI) are becoming increasingly popular in the clinical community (Wang et al., 2020; Locke et al., 2021). Particularly, a growing interest surrounds the exploitation of speech and language as digital biomarkers, namely ‘objective, quantifiable behavioral data that can be collected and measured by means of digital devices, allowing for low-cost pathology detection, classification, and monitoring’ (Gagliardi et al., 2021: 1). In a nutshell, this technique consists of ascertaining subtle verbal changes in speech recordings, transcripts, or written texts produced by patients through automatic algorithms.

In what follows, we will provide an overview of this emerging research field by sketching its theoretical background, methodological implementation, and possible clinical application.

Theoretical background

Language is a complex cognitive function relying on broadly distributed neural networks (Catani et al., 2005; 2012; Braga et al., 2020). Its anatomical basis includes not only the renowned left-dominant perisylvian language network (i.e., an extended region encompassing Broca’s, Wernicke’s, and Geschwind’s territories and the arcuate tract connecting them) but also a complex set of cortical and subcortical brain structures associated with motor and sensory-related representations, non-verbal memory skills, emotional processing, and executive functions (Hagoort, 2017; Hertrich et al., 2020).

Consequently, even minor brain changes due to mental health issues (e.g., psychological distress, reversible or progressive cerebral atrophy, cerebral damage resulting from strokes, traumatic injuries, or infections) or developmental disorders can result in subtle language alterations.

An assessment of communication skills at various levels through standardized psychometric instruments—i.e., the so-called ‘pen-and-pencil test’—is time-consuming and prone to human bias. However, verbal competence can be easily measured by extracting and quantifying a rich bunch of linguistic features (e.g., phonetic, lexical, and syntactic traits) through NLP techniques: This approach is known as digital linguistic biomarkers ( ‘DLBs’) computation.

Indeed, the usage of DLBs, which provide offline and online measures of cognitive activities underlying language processing, turned out to be a cost-effective method for assessing and monitoring cognitive alterations due to various developmental and acquired mental disorders. The advantages of the DLB-based evaluation procedure are its objectiveness (and reproducibility), unintrusiveness, and affordability (Gagliardi et al., 2021). Moreover, this novel approach opens up the possibility of exploring several levels of speech and language production, such as prosody and rhythm, which, otherwise, cannot be examined by human operators in clinical settings (Beltrami et al., 2018). Last but not the least, it can be administered remotely through telemedicine devices. As the COVID-19 outbreak has recently shown, telehealth is of utmost importance, as it keeps both healthcare providers and patients safer during extreme events while ensuring appropriate assistance (König et al., 2021). However, when the pandemic ends, it will also be essential to reach out to and regularly monitor patients affected by chronic diseases, who require continuous follow-up. These advantageous characteristics and the exigent conditions, taken together, make DLBs particularly suitable as a life-course assessment.

DLBs: An overview of methodologies and open challenges

In the last few years, a large number of research papers have been devoted to DLBs. The literature documents several possible strategies to extract and exploit them. However, it is possible to pinpoint at least some common basic steps:

  1. Collection (and optional annotation) of the written/oral texts produced by a cohort of patients and healthy controls. Most of the time, studies adopt an observational retrospective case-control setting (Mann, 2003).

  2. Extraction of quantitative linguistic parameters through NLP technologies (Jurafsky & Martin, 2008) and (optional) preliminary testing of their discriminative powers by inferential statistics (Asadoorian & Kantarelis, 2005) or feature selection algorithms (Liu, 2011).

  3. Classification of these data through automatic learning strategies and the evaluation of results. To date, the main tasks include the following: i) to detect linguistic alterations due to neurodevelopmental/acquired disorders in a screening/diagnostic perspective, ii) to classify possible subtypes of a given pathology, iii) to predict the score of a neuropsychological standardized test, and iv) to monitor the progression of linguistic symptoms over time.

Going into detail, the enrollment of patients represents a crucial step. As far as possible, the cohorts should be balanced considering—at least—sex, age, and education, as conclusions based on skewed corpora are prone to bias, especially in small datasets. At present, data scarcity represents the greatest limitation in applying this technique to clinical practice.

DLBs can be computed directly from audio files or text/transcripts. Transcription can be done manually by human annotators or by using an automatic speech recognizer (ASR). The second scenario is preferable for real-life clinical application. However, current ASR tools are not satisfactorily reliable for pathological speech automatic recognition, at least for languages other than English (Gagliardi & Tamburini, 2022).

DLBs can be extracted through customized algorithms (e.g., Gagliardi & Tamburini, 2022) or open-source tools (e.g., for acoustic indices, OpenSmile; Eyben et al., 2010). According to the literature, a surprisingly high number of DLBs have been tested. For the sake of clarity, they can be classified into three groups (Voleti et al., 2020: 284; de la Fuente Garcia et al., 2020: 1552; Petti et al., 2020: 1791).

  1. Speech-based features, which encompass speech cues conveying linguistic or paralinguistic information (e.g., emotions and irony) and are directly extracted from audio samples. They can be further divided as follows:

    1. Acoustic DLBs (e.g., López-de-Ipiña et al., 2015, 2018; Haider et al., 2020), dealing with the following:

      • Prosody and, particularly, temporal properties of utterances (e.g., pause, speech, or articulation rate) and F0 -fundamental frequency of the voice.

      • Loudness and energy.

      • Spectral properties, including formant trajectories (i.e., F1, F2, and F3), mel-frequency cepstral coefficient (MFCC), and spectral centroid.

      • Vocal quality, such as jitter, shimmer, and harmonic-to-noise ratio (HNR).

    2. Rhythmic DLBs, quantifying the variability of the syllabic intervals, i.e., the percentage of vocalic/consonantal intervals (cf. Ramus et al., 1999; Delwo, 2006).

  2. Text-based features, which are computed on written texts or transcripts and estimate linguistic properties at these linguistic levels:

    1. Lexical DLBs, probing vocabulary richness (e.g., type-token Ratio), the ‘density’ of verbal productions (e.g., content density and idea density), the rate of part of speech (e.g., percentage of nouns, verbs, adjectives, and pronouns), or the incidence of specific lexical-semantic categories (cf. LIWC - Linguistic Inquiry and Word Count, Tausczik & Pennebaker, 2010).

    2. Syntactic DLBs, quantifying the complexity of sentence structure on constituency-based or dependency-based parse trees.

    3. Semantic DLBs, exploring the meaning of the texts (e.g., through matrix decomposition methods such as LSA - latent semantic analysis and PCA - principal component analysis, embeddings, topic modeling, and sentiment analysis).

    4. Pragmatic DLBs, measuring the usage of deictics and the coherence of the text.

  3. Extra-linguistic/multimodal features, which consider the gaze (e.g., Fraser et al., 2019), smile (e.g., Tanaka et al., 2017), and gait (e.g., Shinkawa et al., 2019) and are collected through wearable sensors.

As the reader can notice, machine learning (ML)—i.e., the subfield of AI that ‘is concerned with the question of how to construct computer programs that automatically improve with experience’ (Mitchell, 1997: XV) to automatically detect meaningful patterns in data—is crucial to this research program.

A wide range of algorithms have been applied to this task: The choice mostly comes down to the corpus size. In simple terms, patients’ data are usually annotated with labels for the target diagnosis (e.g., ‘AD’ for Alzheimer’s disease and ‘HC’ for healthy controls), established based on clinical evaluation. This information (i.e., the ‘ground truth’) is provided to the ML algorithm for the training: This procedure enables the ML system to ‘learn’ how to discriminate the diagnostic class and predict from previously unseen data on a statistical basis. This sort of conventional supervised algorithm are of the following types: naïve Bayes classifiers, k-nearest neighbor, logistic regression, support vector machine (SVM), random forest (RF), and decision trees. Less commonly, researchers use ‘unsupervised’ learning modes: In this case, the algorithm is not provided with any pre-annotated labels and must seek to self-discover the occurrence of hidden patterns within unstructured data (Jo, 2021). To date, due to small datasets, a smaller number of studies have utilized artificial neural networks and deep learning methods (Aggarwal, 2019) and even fewer have utilized transfer learning-based approaches and language models (Yang et al., 2020) such as BERT (Devlin et al., 2019).

The performance of these systems is measured through ‘classical’ clinical or information retrieval evaluation metrics, respectively:

  • Receiver operating characteristics curve (ROC), ‘area under the curve’ (AUC), and ‘equal error rate’ (EER).

  • ‘Accuracy’ (i.e., the number of correctly predicted samples over the total number of samples), ‘precision’ (i.e., the fraction of relevant samples among the retrieved samples), ‘recall’ (i.e., the fraction of the total amount of relevant samples retrieved), and ‘F-measure’ (or ‘F1-score,’ i.e., the harmonic mean of precision and recall).

Current and future applications

As suggested in the previous pages, speech and language can represent a valuable source of information on cognitive status, as they depart from the norm in neurodevelopmental disorders, change depending on psychological symptoms, and decline in parallel with neurodegeneration.

Dementia screening has been the first and the most rapidly evolving domain of DLBs clinical application (de la Fuente Garcia et al., 2020) but, in the present day, this groundbreaking approach is being used for the recognition of not only other clinical or psychiatric conditions in the adult population such as cognitive dysfunctions associated with metabolic disorders (e.g., type 2 diabetes mellitus, cf. Imre et al., 2019), depression (De Souza et al., 2021), dysarthria (e.g., due to Parkinson’s disease, cf. Rahman et al., 2021), and psychosis (Tang et al., 2022) but also developmental disorders such as autism spectrum disorder (Patel et al., 2021; Beccaria et al., 2022).


I am grateful to Professor Tommaso Raso for asking me to write this short article. I also thank Alice Suozzi for her critical reading of the text and insightful suggestions.


Introductory Bibliography

Jurafsky D. & Martin J.H. (2008). Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition [2 ed.]. Hoboken (NJ): Prentice Hall.

de la Fuente Garcia S., Ritchie C.W. & Luz, S. (2020a) Artificial Intelligence, speech, and language processing approaches to monitoring Alzheimer’s Disease: a systematic review. Journal of Alzheimer’s Disease, 78(4): 1547–1574.

Gagliardi G., Kokkinakis D. & Duñabeitia J.A. (2021) Editorial: Digital Linguistic Biomarkers: Beyond Paper and Pencil Tests. Frontiers in Psychology, 12:752238.

Gagliardi G. & Tamburini F. (2022) The Automatic Extraction of Linguistic Biomarkers as a Viable Solution for the Early Diagnosis of Mental Disorders. In Calzolari, N. et al. (eds.) Proceedings of the 13th Language Resources and Evaluation Conference (LREC 2022). Paris: ELRA, 5234–5242.

Petti U., Baker S. & Korhonen A. (2020) A systematic literature review of automatic Alzheimer’s disease detection from speech and language. Journal of the American Medical Informatics Association, 27(11): 1784-1797.

Voleti R., Liss J.M. & Berisha V. (2010) A review of automated speech and language features for assessment of cognitive and thought disorders. IEEE Journal of Selected Topics in Signal Processing, 14(2): 282-298.

Advanced Bibliography

Aggarwal C.C. (2019) Neural Networks and Deep Learning: A Textbook. New York (NY): Springer Nature.

Asadoorian M.O & Kantarelis D. (2005). Essentials of Inferential Statistics. Lanham (MD): University Press of America.

Beccaria F., Gagliardi G. & Kokkinakis D. (2022) Extraction and Classification of Acoustic Features from Italian Speaking Children with Autism Spectrum Disorders. In D. Kokkinakis et al. (eds.), Proceedings of the LREC 2022 workshop on: Resources and ProcessIng of linguistic, para-linguistic and extra-linguistic Data from people with various forms of cognitive/psychiatric/developmental impairments (RaPID-4 2022). Paris: ELRA, 22–30.

Beltrami D., Gagliardi G., Rossini Favretti R., Ghidoni E., Tamburini F. & Calzà L. (2018). Speech analysis by Natural Language Processing techniques: A possible tool for very early detection of cognitive decline? Frontiers in Aging Neuroscience, 10: 369.

Braga R.M., DiNicola L.M., Becker H.C.B. & Buckner R.L. (2020). Situating the left-lateralized language network in the broader organization of multiple specialized large-scale distributed networks. Journal of Neurophysiology, 124: 1415–1448.

Catani M., Jones D. K., & Ffytche D. H. (2005). Perisylvian language networks of the human brain. Annals of Neurology, 57(1):8–16.

Catani M., Dell’Acqua F., Bizzi A., Forkel S. J., Williams S. C., Simmons A., Murphy D. G. & Thiebaut de Schotten M. (2012). Beyond cortical localization in clinico-anatomical correlation. Cortex, 48(10):1262–1287.

De Souza D.D., Robin. J., Gumus M. & Yeung A. (2021) Natural Language Processing as an emerging tool to detect Late-Life Depression. Frontiers in Psychiatry, 12: 719125.

Delwo V. (2006). Rhythm and Speech Rate: A Variation Coefficient for DeltaC. In P. Karnowski, I. Szigeti (eds.), Language and Language-Processing. Frankfurt am Main: Peter Lang, 231–241.

Devlin J., Chang M.-W., Lee K., & Toutanova K. (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Volume 1 (Long and Short Papers). Stroudsburg (PA): ACL, 4171-4186.

Eyben F., Wöllmer M. & Schuller B. (2010) OpenSMILE: the munich versatile and fast open-source audio feature extractor. In Del Bimbo, A., Chang, S.F. & Smeulders, A. (eds.) MM '10: Proceedings of the 18th ACM international conference on Multimedia. New York (NY): ACM, 1459-1462.

Fraser K.C., Lundholm Fors K., Eckerström M., Öhman F. & Kokkinakis D. (2019). Predicting MCI status from multimodal language data using cascaded classifiers. Frontiers in Aging Neuroscience, 11: 205.

Hagoort P. (2017). The core and beyond in the language-ready brain. Neuroscience and biobehavioral reviews, 81(Pt B):194–204.

Haider F., de la Fuente S. & Luz S. (2020) An assessment of paralinguistic acoustic features for detection of Alzheimer’s Dementia in spontaneous speech. IEEE Journal of Selected Topics in Signal Processing, 14(2), 272-281.

Hertrich I., Dietrich S. & Ackermann H. (2020). The margins of the language network in the brain.
Frontiers in Communication, 5:93.

Imre N., Balogh R., Gosztolya G., Tóth L., Várkonyi T., Lengyel C. Pákáski M. & Kálmán J. (2019) Automatic recognition of temporal speech features in type 2 diabetes mellitus with mild cognitive impairment. In Baranyi, P., Esposito, E., Maldonato, M. & Vogel, C. (eds.)10th IEEE International Conference on Cognitive Infocommunications (CogInfoCom). Piscataway (NJ): IEEE, 27-28.

Jo T. (2021) Machine Learning Foundations: Supervised, Unsupervised, and Advanced Learning. Cham: Springer Nature.

König A., Zeghari R., Guerchouche R., Duc Tran M., Bremond F., Linz N., Lindsay H., Langel K., Ramakers I., Lemoine P., Bultingaire V. & Robert P. (2021a). Remote cognitive assessment of older adults in rural areas by telemedicine and automatic speech and video analysis: protocol for a cross-over feasibility study. BMJ Open, 11: e047083.

Liu H. (2011). Feature Selection. In C. Sammut, G.I. Webb (eds.), Encyclopedia of Machine Learning. Boston (MA): Springer, 402–406.

Locke S., Bashall A., Al-Adely S., Moore J., Wilson A. & Kitchen G.B. (2021). Natural language processing in medicine: A review. Trends in Anaesthesia and Critical Care, 38: 4-9.

López-de-Ipiña K., Alonso J.B., Solé-Casals J., Barroso N., Henriquez P., Faundez-Zanuy M., Travieso- Gonzalez C.M., Ecay-Torres M., Martínez-Lage P. & Eguiraun H. (2015). On automatic diagnosis of Alzheimer’s Disease based on spontaneous speech analysis and emotional temperature. Cognitive Computation, 7: 44-55.

Lopez-de-Ipiña K., Martinez-de-Lizarduy U., Calvo P.M. Mekyska J., Beitia B., Barroso N., Estanga A., Tainta M. & Ecay-Torres M. (2018) Advances on automatic speech analysis for early detection of Alzheimer Disease: a non-linear multi-task approach. Current Alzheimer Research, 15: 139-148.

Mann C.J. (2003). Observational research methods. Research design II: cohort, cross sectional, and case-control studies. Emergency Medicine Journal, 20: 54-60.

Mitchell T.M. (1997). Machine Learning. New York (NY): McGraw-Hill.

Patel S.P., Nayar K., Martin G.E., Franich K., Crawford S., Diehl J.J. & Losh M. (2020). An Acoustic Characterization of Prosodic Differences in Autism Spectrum Disorder and First-Degree Relatives. Journal of autism and developmental disorders, 50(8):3032-3045.

Rahman, W., Lee, S., Islam, M.S., Antony, V.N., Ratnu, H., Ali, M.R., Mamun, A.A., Wagner, E., Jensen-Roberts, S., Waddell, E., Myers, T., Pawlik, M., Soto, J., Coffey, M., Sarkar, A., Schneider, R., Tarolli, C., Lizarraga, K., Adams, J., Little, M.A., Dorsey, E.R. & Hoque, E. (2021) Detecting Parkinson Disease using a web-based speech task: observational study. Journal of medical Internet research, 23(10): e26305.

Ramus F., Nespor M. & Mehler J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73: 265-292.

Shinkawa K., Kosugi A., Nishimura M., Nemoto M., Nemoto K., Takeuchi T., Numata Y., Watanabe R., Tsukada E., Ota M., Higashi S., Arai T. & Yamada Y. (2019). Multimodal behavior analysis towards detecting mild cognitive impairment: preliminary results on gait and speech. Studies in health technology and informatics, 264: 343-347.

Tanaka H., Adachi H., Ukita N., Ikeda M., Kazui H., Kudo T. & Nakamura S. (2017) Detecting dementia through interactive computer avatars. IEEE Journal of Translational Engineering in Health and Medicine, 5: 2200111.

Tang S.X., Cong Y., Nikzad A.H., Mehta A., Cho S., Hänsel K., Berretta S., Dhar A.A., Kane J.M. & Malhotra A.K. (2022). Clinical and computational speech measures are associated with social cognition in schizophrenia spectrum disorders. Schizophrenia Research, doi: 10.1016/j.schres.2022.06.012.

Tausczik Y. & Pennebaker J. (2010) The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1): 24–54.

Wang J., Deng H., Liu B., Hu A., Liang J., Fan L., Zheng X., Wang T. & Lei J. (2020). Systematic evaluation of research progress on natural language processing in medicine over the past 20 years: Bibliometric study on pubmed. Journal of Medical Internet Research, 22(1): e16816.

Yang Q., Zhang Y., Dai W., Pan S.J. (2020). Transfer Learning. Cambridge: Cambridge University Press.