News:
Cognitive problems, such as memory loss, speech and language impairment, and reasoning difficulties, occur frequently among older adults and often precede the onset of dementia syndromes. Due to the high prevalence of dementia worldwide, research into cognitive impairment for the purposes of dementia prevention and early detection has become a priority in healthcare. There is a need for cost-effective and scalable methods for assessment of cognition and detection of impairment, from its most subtle forms to severe manifestations of dementia. Speech is an easily collectable behavioural signal which reflects cognitive function, and therefore could potentially serve as a digital biomarker of cognitive function, presenting a unique opportunity for application of speech technology. While most studies to date have focused on English speech data, the TAUKADIAL Challenge aims to explore speech as a marker of cognition in a global health context, providing data from two major languages, namely, Chinese and English. The TAUKADIAL Challenge's tasks will focus on prediction of cognitive test scores and diagnosis of mild cognitive impairment (MCI) in older speakers of Chinese and English, using samples of connected speech. We expect that approaches that are language independent will be favoured. This INTERSPEECH Challenge will bring together members of the speech, signal processing, machine learning, natural language processing and biomedical research communities, enabling them to test existing methods or develop novel approaches on a new shared standardised dataset which will remain available to the community for future research and replication of results.
To register for the TAUKADIAL Challenge and gain access to the TAUKADIAL dataset, please email taukadial2024@ed.ac.uk with your contact information and affiliation. Full access to the dataset will be provided through DementiaBank membership. To become a member, please include in your email to taukadial2024@ed.ac.uk a general statement of how you plan to use the data, with a specific mention to the TAUKADIAL Challenge. If you are a student, please ask your supervisor to join DementiaBank as well.
The TAUKADIAL challenge encompasses the following tasks:
You may choose to do one or both of these tasks. You will be provided with access to a training set (see relevant section below), and two weeks prior to the paper submission deadline you will be provided with test sets on which you can test your models.
You may send up to five sets of results to us for scoring for each task. You are required to submit all your attempts together, in separate files named: taukadial_results_task1_attempt1.txt, taukadial_results_task2_attemp1.txt (or one of these, should you choose not to enter both tasks). These must contain the IDs of the test files and your model's predictions. You will be provided with README files in the test sets archives with further details. The test sets will contain README.md files with further details.
As the broad scientific goal of TAUKADIAL is to gain insight into the nature of the relationship between speech and cognitive function across different languages, we encourage you to upload a paper describing your approaches and results to a pre-print repository such as arXiv or medRxiv, and to submit your paper to INTERSPEECH, regardless of your position in the rank. Note, however, that for INTERSPEECH submissions, "online posting of any version* of the paper under submission is forbidden during an anonymity period starting one month prior to the Interspeech submission deadline and up to the moment the accept/reject decisions are announced" . So, any submissions to pre-print repositories should comply with this policy.
We also encourage you to share your code through a publicly accessible repository, if possible using a literate programming "notebook" environment such as R Markdown or Jupyter Notebook.
The data set has been balanced with respect to age and sex in order to eliminate potential confunding and bias. We employed a propensity score approach to matching (Rosenbaum & Rubin, 1983; Rubin 1973; Ho et al. 2007). It contains both Chinese and English audio files with recordings of picture descriptions. There are 3 picture descriptions per participant. The file names are in the following format taukdial-MMM-N.wav, where MMM is a random integer, and N is an integer between 1 and 3 (inclusive) indicating the picture description contained in the recording. Note that the three pictured used in the English descriptions are different from the three pictures described by the Chinese speakers.
Please email taukadial2024@ed.ac.uk to get access to the training set,
as described above.
Test set
The test data
are now
available at DementiaBank (you will need your login details to
download it). Please
email taukadial2024@ed.ac.uk
for instructions on how to submit your model's predictions.
The ground truth for test data is also available.
As the goal of the TAUKADIAL Challenge is to explore models that generalise across languages, we encourage participants to develop models encompassing features extracted from both languages. A possible architecture for a classification or regression system for this challenge could be as shown below, where comparable features extracted from both languages are combined into a single predictive model:
Task 1: MCI classification will be evaluated through specificity (\(\sigma\)), sensitivity (\(\rho\)) and \(F_1\) scores for the MCI category. These metrics will be computed as follows: \[ \displaystyle \operatorname{\sigma} = { \frac { TN }{TN + FP} }, \] and \[ \displaystyle \operatorname {F_1} = { \frac { 2 \pi \rho }{\pi + \rho} } \] where \[ \displaystyle \operatorname {\pi} = { \frac { TP }{TP + FP} }, \] \[ \displaystyle \operatorname {\rho} = { \frac { TP }{TP + FN} }, \] N is the number of patients, TP is the number of true positives, TN is the number of true negatives, FP is the number of false positives and FN the number of false negatives.
The balanced accuracy metric (unweighted average recall, UAR) will be used for the overall ranking of this task's results: \[ \displaystyle \operatorname {UAR} = {\frac { \sigma + \rho }{2} } \]
Task 2 (MMSE prediction) will be evaluated using the coefficient of determination: \[ \displaystyle \operatorname {R^2} =1 - \frac {\sum_{i=1}^N(\hat{y}_{i} - y_{i})^2} {\sum_{i=1}^N(\hat{y}_{i} - \bar{y})^2} \] and the root mean squared error: \[ \displaystyle \operatorname {RMSE} ={\sqrt {\frac {\sum _{i=1}^{N}({\hat {y}}_{i}-y_{i})^{2}}{N}}} \] where \(\hat{y}\) is the predicted MMSE score, \(y\) is the patient's actual MMSE score, and \(\bar{y}\) is the mean score.
When more than one attempt is submitted for scoring against the test set, all results should be considered (not only the best result overall) and reported in the paper.
The ranking of submissions will be done based on accuracy scores for the classification task (task 1), and on RMSE scores for the MMSE score regression task (task 2).A paper describing the TAUKADIAL Grand Challenge and its dataset more fully, along with a basic set of baseline results will be shared with the registered TAUKADIAL Challenge participants, and eventually submitted to INTERSPEECH. Papers submitted to this Challenge using the TAUKADIAL dataset should cite this paper as follows
We encourage you to submit papers describing your approaches to the
tasks set here to https://arxiv.org/,
after the INTERSPEECH anonymity period, and to share your code
through open-source repositories. Please note that the intellectual
property (IP) related to your submission is not transferred to the
challenge organizers, i.e., if code is shared/submitted, the
participants remain the owners of their code. When the code is made
publicly available, an appropriate license should be added.
Important Dates
See Call for Papers and Author resources at the INTERSPEECH 2024 web site for instructions.
Saturnino Luz is Professor of Digital Biomarkers and Precision Medicine at the Usher Institute, University of Edinburgh's Medical School. He works in medical informatics, devising and applying machine learning, signal processing and natural language processing methods in the study of behaviour and communication in healthcare contexts. His main research interest is the computational modelling of behavioural and biological changes caused by neurodegenerative diseases, with focus on the analysis of vocal and linguistic signals in Alzheimers's disease. |
Sofia de la Fuente Garcia is a Teaching Fellow in Clinical Psychology at the School of Health in Social Science, University of Edinburgh. She completed a PhD in Precision Medicine in 2020, which was an exploratory study of psycholinguistics, paralinguistics and acoustic features that may help predict dementia onset later in life, in the same institution. She continues to investigate speech technology for monitoring progression in the context of neurodegenerative diseases. |
Fasih Haider is a Research Associate in Machine Learning at the School of Engineering, University of Edinburgh, UK. His areas of interest are Social Signal Processing and Artificial Intelligence. Before joining the Usher Institute, he was a Research Engineer at the ADAPT Centre where he worked on methods of Social Signal Processing for video intelligence. He holds a PhD in Computer Science from Trinity College Dublin, Ireland. Currently, he is investigating the use of social signal processing and machine learning for monitoring cognitive health. |
Alyzza Lanzi is a Assistant Professor Communication Sciences & Disorders, University of Delaware, USA. Dr. Lanzi is an Assistant Professor in the Department of Communication Sciences and Disorders at the University of Delaware. Her research aims to develop and investigate evidence-based cognitive treatments that promote independence for adults with geriatric neurodegenerative conditions. |
Davida Fromm is a Research Faculty member in the Psychology Department at Carnegie Mellon University. Her research interests have focused on aphasia, dementia, and apraxia of speech in adults. Since 2007, she has been working on the TalkBank project, developing large shared databases of multi-media interactions for the study of discourse in a variety of neurogenic communication disorders. The databases include resources for educational, clinical, and research applications. |
Brian MacWhinney is Teresa Heinz Professor of Psychology, Computational Linguistics,and Modern Languages at Carnegie Mellon University. He received his Ph.D. in psycholinguistics in 1974 from the University of California at Berkeley. With Elizabeth Bates, he developed a model of first and second language processing and acquisition based on competition between item-based patterns. In 1984, he and Catherine Snow co-founded the CHILDES (Child Language Data Exchange System) Project for the computational study of child language transcript data. This system has extended to 13 additional research areas such aphasiology, second language learning, TBI, Conversation Analysis, developmental disfluency and others in the shape of the TalkBank Project. MacWhinney's recent work includes studies of online learning of second language vocabulary and grammar, situationally embedded second language learning, neural network modeling of lexical development, fMRI studies of children with focal brain lesions, and ERP studies of between-language competition. He is also exploring the role of grammatical constructions in the marking of perspective shifting, the determination of linguistic forms across contrasting time frames, and the construction of mental models in scientific reasoning. Recent edited books include The Handbook of Language Emergence (Wiley) and Competing Motivations in Grammar and Usage (Oxford). |
Chia-Ju Chou is a Postdoctoral Researcher in the Department of Neurology at Cardinal-Tien Hospital, Taiwan. Her research focuses on investigating reading comprehension in individuals with mild cognitive impairment and aphasia using ERP, fMRI and eye tracking. She applies these techniques to improve clinical diagnosis and rehabilitation assessment. Her current work focuses on collecting and analysing speech samples from various tasks to identify linguistic features that may indicate cognitive decline. |
Ya-Ning Chang is an Assistant Professor at the Miin Wu School of Computing, National Cheng Kung University, Taiwan. She obtained her PhD in Psychology at the University of Manchester, UK. Dr Chang's primary research interest lies in the broad fields of artificial intelligence and cognitive science. Her work involves using a combined approach of computational modelling, behavioural studies, neuroimaging techniques, and corpus analysis to investigate various aspects of language processing and semantic cognition, and how cognitive processes are related to education, learning, and memory. She has applied computational modelling to language processing in different populations including children, adults, and patients suffering from language impairment (e.g., aphasia). Her recent work focuses on bringing together natural language processing and the application of developing various cognitive measures to support human-like communication in conversational agents and a real-life diagnosis of language disorders. |
Yi-Chien Liu is a clinical neurologist from Taipei, Taiwan. Currently, he is director of the Neurology Department at Cardinal Tien Hospital. He holds a Ph.D. in Geriatric Cognitive Neuroscience from Tohoku University, Japan. His primary research areas encompass Alzheimer’s disease, mild cognitive impairment, primary progressive aphasia, and various other neurodegenerative disorders affecting the elderly. Currently, he is spearheading a research project centered around a memory clinic-based AD (Alzheimer's Disease) continuum cohort. |