Incremental multi-party conversational AI for people with dementia
Abstract
Spoken dialogue systems (SDSs, e.g. Siri and Alexa) are trained on huge corpora, helping
them accurately understand the ‘average’ user. Speech production is nuanced, however,
so some user groups fall outside the ‘average’. This thesis focuses on SDSs for people
with dementia (PwD). More naturally interactive and accessible SDSs can improve people’s autonomy at home, and in public spaces. Three challenges are tackled in this thesis,
ethical data collection, incrementality, and multi-party conversations (MPCs).
Part I details the motivations of this work, in the context of voice assistant accessibility,
with a specific focus on language technologies for people with dementia. The thesis is
then introduced in its entirety through published paper summaries, with a structure guide.
Part II focuses on data collection. An ethical framework is presented to ensure all data is
collected ethically. A data capture device is then presented to address novel challenges
introduced by COVID-19. Using the ethical framework and device, the DEICTIC corpus
was collected. It verified that, when talking to an SDS, PwD pause significantly more
often, and for significantly longer durations, than people without dementia. The corpus
also reveals that 28% of PwD’s interactions with an SDS are MPCs involving their partner. SDSs are not adapted for MPCs, so a second data collection was designed. Hospital
staff subsequently used this design with memory clinic patients and their companions.
Part III focuses on incrementality. Microsoft’s incremental speech recognition is the most
responsive, stable, accurate, and the only one that preserves disfluent material. IBM’s
services were the most suitable for MPCs. Two corpora were created and released to
explore incremental semantic parsing, together containing over 105,000 interrupted utterances paired with their underspecified meaning representation. SDSs interrupt users
if they pause too long mid-utterance, requiring them to frustratingly repeat themselves.
The use of incremental clarification requests (iCRs, e.g. “author of what?”) leads to more
naturally interactive SDSs, and improves their accessibility for PwD. Another new corpus
was created and released, containing 3,000 human elicited clarification requests. It was
used to show that some large language models (LLMs) can generate context-appropriate
iCRs, and can interpret clarification exchanges as if they were one uninterrupted turn.
Part IV tackles MPCs. The hospital corpus showed that MPCs elicit unique, complex behaviours. LLMs performed remarkably at the new task of multi-party goal tracking, when
given examples from the corpus. A multi-party SDS is required for further research, so
all the work presented in this thesis was integrated into one system, embodied by an ARI robot. It has been designed to handle MPCs with memory clinic patients and their companions, and is designed to be accessible for PwD. When PwD pause mid-utterance, the
system generates an appropriate iCR, and interprets the resulting clarification exchange.
In summary, this thesis identifies that PwD pause significantly more often, and for significantly longer durations, than people without dementia. Additionally, these interactions
are often multi-party. When mid-utterance pauses occur, interactions can be recovered
through the use of iCRs. Using the SLUICE-CR corpus, LLMs can generate effective
and human-like iCRs. They can also be used to interpret clarification exchanges, and interpret multi-party interactions. This work was integrated and deployed on a social robot
to enable conversations between the robot, memory clinic patients, and their companions.