Augment Intelligence with Multimodal Information
Humans interact with other humans or the world through information from various channels including vision, audio, language, haptics, etc. To simulate intelligence, machines require similar abilities to process and combine information from different channels to acquire better situation awareness, better communication ability, and better decision-making ability. In this talk, we describe three projects. In the first study, we enable a robot to utilize both vision and audio information to achieve better user understanding.
Then we use incremental language generation to improve the robot's communication with a human. In the second study, we utilize multimodal history tracking to optimize policy planning in task-oriented visual dialogs. In the third project, we tackle the well-known trade-off between dialog response relevance and policy effectiveness in visual dialog generation. We propose a new machine learning procedure that alternates from supervised learning and reinforcement learning to optimum language generation and policy planning jointly in visual dialogs.
We will also cover some recent ongoing work on image synthesis through dialogs, and generating social multimodal dialogs with a blend of GIF and words.
Zhou Yu is an Assistant Professor at the Computer Science Department at UC Davis. She received her PhD from Carnegie Mellon University in 2017. Zhou is interested in building robust and multi-purpose dialog systems using fewer data points and less annotation. She also works on language generation, vision and language tasks. Zhou's work on persuasive dialog systems received an ACL 2019 best paper nomination recently. Zhou was featured in Forbes as 2018 30 under 30 in Science for her work on multimodal dialog systems. Her team recently won the 2018 Amazon Alexa Prize on building an engaging social bot for a $500,000 cash award.
The recent proliferation of conversational AI creatures is still superficially navigating on shallow waters with regards to language understanding and generation. Accordingly, these new types of creatures are failing to properly dive in the deep oceans of human-like usage of language and intelligence. FINDING NEMD (New Evaluation Metrics for Dialogue) is an epic journey across the seas of data and data-driven applications to tame its conversational AI creatures for the benefit of science and humankind.
Rafael is a Senior Research Scientist at Intapp Inc. His research focuses on applying NLP technologies to problems in the professional services industry. He is also Adjunct Associate Professor at Nanyang Technological University (NTU) in Singapore, where he supervises student projects in question answering and conversational agent related applications. He has previous experience organizing workshop at ACL and other International Conferences, including workshop series in Named Entities (NEWS), Conversational Agents (WOCHAT) and Machine Translation (HyTra).
Better dialogue generation!
Generative dialogue models currently suffer from a number of problems which standard maximum likelihood training does not address. They tend to produce generations that rely on too much copying, contain repetitions, overuse frequent words, and at a deeper level, contain logical flaws. We describe how all of these problems can be addressed by extending the recently introduced unlikelihood loss (Welleck et al., 2019) to these cases. We show that appropriate loss functions which regularize generated outputs to match human distributions are effective for the first three issues. For the last important general issue, we show applying unlikelihood to collected data of what a model should not do is effective for improving logical consistency, potentially paving the way to generative models with greater reasoning ability.
Jason Weston is a research scientist at Facebook, NY and a Visiting Research Professor at NYU. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. Since then, he has worked at Biowulf technologies, the Max Planck Institute for Biological Cybernetics, and Google Research. Jason has published over 100 papers, including best paper awards at ICML and ECML, and a Test of Time Award for his work "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning", ICML 2008 (with Ronan Collobert). He was part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery. He was listed as the 16th most influential machine learning scholar at AMiner and one of the top 50 authors in Computer Science in Science.