Mathematical & Computer Sciences

Permanent URI for this communityhttps://dspace-upgrade.is.ed.ac.uk/handle/10399/20

Browse

Search Results

Now showing 1 - 1 of 1
  • Thumbnail Image
    Item
    Context modelling for visually grounded response selection and generation
    (Heriot-Watt University, 2021-08) Agarwal, Shubham; Konstas, Doctor Yannis; Rieser, Professor Verena
    With recent progress in deep learning, there has been an increased interest in visually grounded dialog, which requires an AI agent to hold a meaningful conversation with humans in Natural Language about visual content in other modalities, e.g. pictures or videos. This thesis contributes improved context modelling techniques for multi-modal visually grounded response selection and generation. We show that knowing about relevant context encodings enables a system to respond more accurately and more helpfully to the user request. We also show that different types of context encodings are relevant for different multi-modal visually grounded tasks and datasets. In particular, this thesis focuses on two specific scenarios: response generation for task-based multimodal search and open-domain response selection for image-grounded conversations. For these tasks, the thesis contributes new models for context encoding, including knowledge grounding, encoding history, and multimodal fusion. Throughout these tasks, the thesis provides an in-depth critical analysis of shortcomings of current models, tasks and evaluation metrics.
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by the author's copyright.