Context modelling for visually grounded response selection and generation

dc.contributor.advisorKonstas, Doctor Yannis
dc.contributor.advisorRieser, Professor Verena
dc.contributor.authorAgarwal, Shubham
dc.date.accessioned2023-02-21T12:47:00Z
dc.date.available2023-02-21T12:47:00Z
dc.date.issued2021-08
dc.description.abstractWith recent progress in deep learning, there has been an increased interest in visually grounded dialog, which requires an AI agent to hold a meaningful conversation with humans in Natural Language about visual content in other modalities, e.g. pictures or videos. This thesis contributes improved context modelling techniques for multi-modal visually grounded response selection and generation. We show that knowing about relevant context encodings enables a system to respond more accurately and more helpfully to the user request. We also show that different types of context encodings are relevant for different multi-modal visually grounded tasks and datasets. In particular, this thesis focuses on two specific scenarios: response generation for task-based multimodal search and open-domain response selection for image-grounded conversations. For these tasks, the thesis contributes new models for context encoding, including knowledge grounding, encoding history, and multimodal fusion. Throughout these tasks, the thesis provides an in-depth critical analysis of shortcomings of current models, tasks and evaluation metrics.en
dc.identifier.urihttp://hdl.handle.net/10399/4620
dc.language.isoenen
dc.publisherHeriot-Watt Universityen
dc.publisherMathematical and Computer Sciencesen
dc.titleContext modelling for visually grounded response selection and generationen
dc.typeThesisen

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
AgarwalS_0821_macsSS.pdf
Size:
23.47 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: