Amazon Alexa is the Founding Sponsor of Interspeech 2018!
The Alexa team looks forward to meeting you at Interspeech 2018! Come and visit our booth to learn more about our research and career opportunities. Below is more information about our technology and team.
Technologies We Focus On
The Alexa Science team made the magic of Alexa possible, but that was just the beginning. Our goal is to make voice interfaces ubiquitous and as natural as speaking to a human. We have a relentless focus on the customer experience and customer feedback. We use many real-world data sources including customer interactions and a variety of techniques like highly scalable deep learning. Learning at this massive scale requires new research and development. The team is responsible for cutting-edge research and development in virtually all fields of Human Language Technology: Automatic Speech Recognition (ASR), Artificial Intelligence (AI), Natural Language Understanding (NLU), Question Answering, Dialog Management, and Text-to-Speech (TTS). See an interview with VP Rohit Prasad here.
Alexa scientists and developers have large-scale impact on customer’s lives and on the industry-wide shift to voice user interfaces. Scientists and engineers in the Alexa team also invent new tools and APIs to accelerate development of voice services by empowering developers through the Alexa Skills Kit and the Alexa Voice Service. For example, developers can now create a new voice experience by simply providing a few sample sentences.
Your discoveries in speech recognition, natural language understanding, deep learning, and other disciplines of machine learning can fuel new ideas and applications that have direct impact on peoples’ lives. We firmly believe that our team must engage deeply with the academic community and be part of the scientific discourse. There are many opportunities for presentations at internal machine learning conferences, which act as a springboard for publications at premier conferences. We also partner with universities through the Alexa Prize.
3 questions with Prem Natarajan, the head of Alexa AI's
Natural Language Understanding organization
and member of the Interspeech organizing committee
The theme of this year’s Interspeech is “speech research for emerging markets in multilingual societies”. How is India, where the conference is being held, representative of that theme?
India is the most richly multilingual society I know, in at least two different ways. One is that there are just many different languages layered over the same geography — Marathi, Tamil, Kannada, Hindi, and so on.
But if you look specifically at Bombay, because it’s so cosmopolitan, you have interesting mixes of people. There have been large communities that speak Hindi and Tamil and English, a large community that speaks Marathi and English and Hindi, but no overlap between Marathi and Tamil. People speak Gujarati and Hindi and English, but little overlap between Gujarati and Marathi or Tamil.
The second way in which it is very interesting is the diversity of regional accents, even for English, across the country. So I can’t imagine a more culturally appropriate location for emerging markets and multilingual societies.
What research challenges are driving the Alexa team’s hiring?
There are two aspects of multilinguality again. One is where you have people switching between languages: one sentence is in one language; the next sentence is in a different language.
And then you have midsentence code switching. There’s a song called “Swag Se Swaagat”, and if I want Alexa to play this song for me, I might say, "Alexa, Swag vaala gaana play karo". The overall syntax is Hindi, but the vocabulary is mixed. The word “play” is mixed in.
The second challenge is exemplified by one of the Alexa Prize challenges, which is having longer natural conversations. So dialogue is an important area of research — natural dialogues, extended dialogues, mixed-initiative dialogues.
With ASR [automatic speech recognition] as well, we’re especially looking for people who are applying end-to-end deep-learning approaches for speech. Within ASR, a dominant approach is a combination of DNNs [deep neural networks] and HMMs [hidden Markov models]. But with end-to-end, I just start with signal, the whole stack is deep learning, and the output is speech.
What are some of the technical challenges posed by multilingual systems, which single-language systems don’t have to face?
How do you get a language model [which uses word sequence probabilities to decide among transcription alternatives] that’s good when you have a mixed vocabulary? Because you may not see as many examples of everything as you’re used to. And when you’re switching languages — if I ask one question in English and then follow up in Hindi — now it’s even harder, because now you have to do language ID to decide which language is being spoken.
Building a single recognizer that does both English and Hindi will, with today’s technology, result in lower accuracy. We know that we get best performance if the recognizer focuses on a language. But if you just do language ID, that increases latency, because you’ve added another component. And you don’t want to learn two recognizers in parallel and pick the best one at the end, because now you’ve wasted all that compute.
So language modeling is a challenge, and computational efficiency is a challenge. And if you go further down, data sparsity. It’s really data sparsity given the kinds of things you’re trying to model. You may have a lot of data, but the number of variations is so huge that even with a lot of data, it still creates sparsity.
2018 Conference Papers
- Contextual Language Model Adaptation for Conversational Agents, Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh, Ariya Rastrow
- Statistical Model Compression for Small-Footprint Natural Language Understanding, Grant Strimel, Kanthashree Mysore Sathyendra, Stanislav Peshterliev
- Play Duration based User-Entity Affinity Modeling in Spoken Dialog System, Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan, Abishek Ravi
- Contextual Slot Carryover for Disparate Schemas, Chetan Naik, Arpit Gupta, Hancheng Ge, Lambert Mathias, Ruhi Sarikaya
- Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates, Joo-Kyung Jum and Young-Bum Kim
- Device Directed Utterance Detection, Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas and Björn Hoffmeister
- R-CRNN: Region-Based Convolutional Recurrent Neural Network for Audio Event Detection, Chieh-Chi Kao, Weiran Wang, Ming Sun, Chao Wang
- A Simple Model for Detection of Rare Sound Events, Weiran Wang, Chieh-chi Kao, Chao Wang
- Detecting Media Sound Presence in Acoustic Scenes, Constantinos Papayiannis, Justice Amoh, Viktor Rozgic, Shiva Sundara, Chao Wang
Amazon Blog Posts related to Interspeech
- Learning to Recognize the Irrelevant
- How Alexa Is Learning to Ignore, TV, Radio and Other Media Players
- Alexa at Interspeech 2018: How Past Interactions Can Lead to More Natural Experiences
- How Alexa Is Learning to Converse More Naturally
- 3 Questions About Interspeech 2018 with Björn Hoffmeister
- Alexa,Do I Need to Use Your Wake Word? How About Now?
- Contextual Clues Can Help Improve Alexa's Speech Recognizers
- How Alexa Can Use Song-Playback Duration to Learn Customers' Preferences
- Shrinking Machine Learning Models for Offline Use
Connect with us at Interspeech!
If you would like to meet with us in person at the conference, please contact firstname.lastname@example.org.
Are you ready for your next opportunity? Check out our open positions on this page here, and learn more about the Alexa team here . We have global opportunities available, and speech and machine learning scientists will be available to meet at Interspeech.
Meet a few members from the Alexa Team at Interspeech
AWS AI Summit 2018: Delivering on the Promise of AI Together
Alexa VP and Head Scientist Rohit Prasad focuses on AI advances that are delighting customers
AWS re:invent 2017: Alexa State of the Science
Alexa VP and Head Scientist Rohit Prasad presents the state of the science behind Amazon Alexa.
AWS re:invent 2017: Alexa State of the Union
Alexa SVP Tom Taylor covers the state of the Alexa business, some early challenges, and how we are approaching emerging trends.
"I spoke to the future and it listened" – Gizmodo
Meet the team of world-class scientists behind Alexa.
2018 Alexa Prize Finals
A look into the 2018 Alexa Prize Challenge
Washington Ideas 2017
Rohit Prasad, VP & Head Scientist, Alexa Machine Learning, talks about the future of Alexa & conversational AI with Alexis Madrigal