MaDrIgAL: MultiDimensional Interaction management and Adaptive Learning

Lead Research Organisation: Heriot-Watt University
Department Name: S of Mathematical and Computer Sciences

Abstract

As tech giants like Google, Facebook, Apple and Microsoft continue to invest in speech technology, the global voice recognition market is projected to reach a value of $133 billion by 2017 (companiesandmarkets.com, 2015). Speech-enabled interactive systems in particular, such as Apple's Siri and Microsoft's Cortana, are starting to show significant economic impact, with the virtual personal assistant (VPA) market estimated to grow from $352 million in 2012 to over $3 billion in 2020 (Grand View Research, 2014).

Although such commercial systems allow consumers to use their voice in interacting with their devices and services, the user experience is still limited due to the lack of naturalness of the conversations and limited social intelligence of the VPA. Moreover, the quality of these user interfaces relies on large, carefully crafted rule sets, making development labour-intensive and not scalable to new application domains. With the emergence of the Internet of Things and voice control in the smart home, there is a huge demand for scalable development of natural conversational interfaces across task domains.

MaDrIgAL will develop a radically new approach to building interactive spoken language interfaces by exploiting the multi-dimensional nature of natural language conversation: in addition to carrying out the underlying task or activity, participants in a dialogue simultaneously address several other aspects of communication, such as giving and eliciting feedback and adhering to social conventions. In analogy to the singing voices in a madrigal, simultaneous processes for each dimension operate in harmony to produce multifunctional, natural utterances. Consider the two alternative responses S2a and S2b in the following example:

U1: Hello, I would like to book a flight to London.
S2a: Which date did you have in mind?
S2b: Ok, flying to London on what date?

Whereas S2a only asks for the next piece of information to book the flight (uni-dimensional), S2b also gives feedback about the arrival city, allowing the user to correct any recognition errors (multi-dimensional). We aim to develop a principled multidimensional modelling and learning framework that covers a wide range of different phenomena, including the implicit confirmation in S2b.

This multi-dimensional approach will not only allow us to build systems that support more natural and effective interactions with users, but also enables cost-effective development of such interfaces for a variety of domains by learning transferable conversational skills (e.g., selecting actions in domain independent dimensions). We will therefore demonstrate our approach by building interactive spoken language interfaces for multiple application domains in a home automation scenario, allowing users to interact with for example their Smart TV or heating control system. We will closely collaborate with the industrial partner SemVox to explore this scenario.

The project will bring together expertise in statistical machine learning approaches to state-of-the-art spoken dialogue systems and natural language generation, as well as linguistic theories of multi-dimensional dialogue modelling (collaborating in particular with academic partner Prof. Bunt). MaDrIgAL will develop Next Generation Interaction Technologies relevant to Health Technology and Assisted Living, as well as tackle the question of a common user interface to the Internet of Things and Big Data.

Planned Impact

From an end-user importance point of view, this work will be of interest to a wide range of businesses and companies in the UK, operating in areas such as speech technology, AI, home automation, etcetera. Specifically, we will develop and release a new spoken dialogue system (SDS) architecture compliant with the recently developed ISO standard for dialogue act annotation. This will promote interoperability, enabling the developer community to collaborate and reuse each other's components more easily, and ultimately help industry in developing products with spoken language interfaces more efficiently. The proposed research will also help to strengthen the impact of statistical SDS. Development costs for statistical techniques seem currently too high for manufacturers, due to the requirement of sufficient domain-specific data. The proposed new modular architecture indirectly addresses the economic feasibility of developing products with interactive interfaces for a variety of application domains.

This will ultimately help the UK to push past the US, which is still the main competitor in this area, despite the emergence of new language technology companies such as VocalIQ and Arria. For example, a senior speech scientist at Apple has said publicly that statistical methods will form the algorithmic basis for Siri in the near future. These increased capabilities will also enable us to tackle more complex interaction scenarios, such as social robotics, situated multimodal dialogue with smart devices, and eventually controlling and managing the Internet of Things.

From a societal importance point of view, this project will create a unified user interface which allows every-day users to access and control the Internet of Things in an intuitive way using natural language, and as such lowers the barrier to access and benefit from new technology. This is especially relevant for elderly and/or disabled users in a home automation scenario for example.
 
Description Heriot-Watt University is currently involved in a couple of knowledge transfer collaborations with industry, including EmoTech LTD and Amazon.com, where we co-create and develop the methods proposed in this grant. For example, we were selected to participate in the prestigious Amazon Alexa challenge, which gives us a great test bed for our methods. Furthermore,Heriot-Watt University has also created a new MSc programme in AI with Speech and Multimodal Interaction, where methods and techniques developed as part of this proposal are taught to students in new courses, such as F20/21CA Conversational Agents.
First Year Of Impact 2016
Sector Digital/Communication/Information Technologies (including Software),Education
Impact Types Economic
 
Description New MSc Programme in Speech and Multimodal Interaction
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
Impact Verena Rieser created a new postgraduate MSc programme at Heriot-Watt, which aims to educate highly employable experts in creating conversational multimodal interfaces. The programme recently received 6 fully funded studentships by the DataLab/ Scottish funding council.
URL http://www.macs.hw.ac.uk/cs/pgcourses/aiws.htm
 
Description DataLab MSc scholarships
Amount £36,000 (GBP)
Organisation Scottish Funding Council (SFC) - Research Grants Council (RGC) 
Sector Public
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 09/2017 
End 08/2018
 
Description DataLab knowledge exchange UK Industry
Amount £114,000 (GBP)
Organisation Scottish Funding Council (SFC) - Research Grants Council (RGC) 
Sector Public
Country United Kingdom of Great Britain & Northern Ireland (UK)
Start 12/2016 
End 12/2017
 
Description Amazon Alexa Challenge 
Organisation Amazon.com
Country United States of America 
Sector Private 
PI Contribution My team was selected to participate in the Amazon Alexa Challenge. The aim of this challenge is to build a social chat bot that can converse coherently and engagingly with humans on popular topics for 20 minutes.
Collaborator Contribution We received a generous gift of $100k and various in-kind contributions, e.g. free training and access to Amazon Web services, Alexa-enabled devices, weekly class with one of Amazon senior researchers, free travel to Amazon HQ in Seattle for the team etc.
Impact Increased recognition and visibility of my research group and department.
Start Year 2016
 
Description EmoTech North Industry Knowledge Exchange 
Organisation EmoTech Ltd
Country United Kingdom of Great Britain & Northern Ireland (UK) 
Sector Private 
PI Contribution we collaborate on designing and implementing a conversational interface for Olly the Robot - a product developed by Emotech Ltd, an in-home robot with conversational capabilities. The Olly robot recently won 4 awards for Innovation at the CES showcase. (The CES Innovation Awards is an annual competition honoring outstanding design and engineering in consumer technology products over the world.) Recently showcased at CES '17 http://www.bbc.com/news/technology-38504512 The project outcome will directly contribute the Olly product of Emotech. Emotech will release 1000-1500 units in June/July via a Kickstarter program to gauge early adopter feedback. Full commercial release is expected in Q3/4 2017 at a retail price of $600-800 per unit. The revenue of Emotech LTD in 2017 is estimated to be £2m, and is expected to grow to £20-40m in 2018. Emotech North Ltd will be a NLP(Natural Language Processing) hub for Emotech. Its growth will create more employment positions, more collaborations with other industry partners and universities in Scotland.
Collaborator Contribution Cash contribution of £58k to support RA. Invited research visit to London (1 week) fully supported.
Impact Robotics hardware, neuroscience, human-computer interaction
Start Year 2016
 
Description Conversational Agents course (HWU) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Contributed to the course 'Conversational Agents', attended by approximately 18 undergraduate students. The contribution included in particular the supervision of 5 of the students carrying out a project as part of their coursework. The project involved working with the dialogue system code developed so far in the Madrigal project and adapting it to a new domain (domain adaptation being one of the key aspects of Madrigal). This activity has enabled knowledge transfer to students and given them an opportunity to get both theoretical and practical experience with developing dialogue systems. Moreover, the interaction with the students and their feedback has triggered major improvements and further development of the Madrigal software.
Year(s) Of Engagement Activity 2017
URL https://sites.google.com/site/olemon/conversational-agents
 
Description Diversity and inclusion in academic ICT research 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Study participants or study members
Results and Impact I am taking part in the focus group Diversity and inclusion in academic ICT research run by the EPSRC and organised by Edinburgh Napier University.
Year(s) Of Engagement Activity 2017
URL https://www.epsrc.ac.uk/newsevents/news/ictdiversityinclusionresearch/
 
Description ECML-PKDD conference (Riva del Garda, Italy) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Attended the ECML-PKDD 2016 conference and gave a talk in one of its workshops, introducing the Madrigal project to peer researchers from both academia and industry. This conference visit also enabled many discussions with specialists in the field of machine learning, which was the topic of the overall conference, and an important aspect of the Madrigal project. The trip also included discussion with project collaborator Prof. Harry Bunt, who gave an invited talk at the workshop. The discussion included the planning of mutual research visits in 2017.
Year(s) Of Engagement Activity 2016
URL http://www.ecmlpkdd2016.org
 
Description Invited industry talk at Thomson Reuters 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Verena Rieser was invited to present her research to Thomson Reuters via an online seminar. This seminar will be broadcasted to all research employees of Thomson Reuters worldwide.
Year(s) Of Engagement Activity 2017
 
Description Invited seminar talk at the University of Pennsylvania, US. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Postgraduate students
Results and Impact Verena Rieser gave an invited seminar talk at the University of Pennsylvania on: "From Dialogue Systems to Social Chatbots: Reinforcement Learning, Seq2Seq, and back again"
Year(s) Of Engagement Activity 2017
URL https://pricelab.sas.upenn.edu/clunch16-17
 
Description Native Scientist German School Outreach 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Verena Rieser engaged school children in her research. The half-day event was organised by Alleman Fun (German Saturday School) and Native Scientist. The engagement activity was held in German.
Year(s) Of Engagement Activity 2016
URL http://www.macs.hw.ac.uk/RoboticsLab/news/german-native-scientist-volunteers-reaching-out-to-childre...
 
Description Women@CS 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Undergraduate students
Results and Impact Verena Rieser organises a local support group for female students studying Computer Science, inspired by the "Sisters Clubs" in American universities. The goal is to attract and retain female UG students to study CS.
Year(s) Of Engagement Activity 2016