Amazon Alexa @ SLT

The Alexa team looks forward to meeting you at SLT 2018! Come and visit our booth to learn more about our research and career opportunities. Below is more  information about our technology and team.

Technologies We Focus On

The Alexa Science team made the magic of Alexa possible, but that was just the beginning. Our goal is to make voice interfaces ubiquitous and as natural as speaking to a human. We have a relentless focus on the customer experience and customer feedback. We use many real-world data sources including customer interactions and a variety of techniques like highly scalable deep learning. Learning at this massive scale requires new research and development. The team is responsible for cutting-edge research and development in virtually all fields of Human Language Technology: Automatic Speech Recognition (ASR), Artificial Intelligence (AI), Natural Language Understanding (NLU), Question Answering, Dialog Management, and Text-to-Speech (TTS). See an interview with VP Rohit Prasad here

Alexa scientists and developers have large-scale impact on customer’s lives and on the industry-wide shift to voice user interfaces. Scientists and engineers in the Alexa team also invent new tools and APIs to accelerate development of voice services by empowering developers through the Alexa Skills Kit and the Alexa Voice Service. For example, developers can now create a new voice experience by simply providing a few sample sentences.

Alexa Research

Your discoveries in speech recognition, natural language understanding, deep learning, and other disciplines of machine learning can fuel new ideas and applications that have direct impact on peoples’ lives. We firmly believe that our team must engage deeply with the academic community and be part of the scientific discourse. There are many opportunities for presentations at internal machine learning conferences, which act as a springboard for publications at premier conferences. We also partner with universities through the Alexa Prize.


3 questions with Chao Wang, an Amazon senior manager and applied scientist who leads an R&D team focused on acoustic event detection


Q. The theme of this year’s IEEE SLT Workshop is spoken language technology in the era of deep learning: challenges and opportunities.  Let’s start with the opportunities. How is the application of deep learning technologies advancing the state-of-the-art in your area of research?  

A.Deep learning technologies are helping us address and solve challenges that we couldn’t solve at scale before. For example, my team is using deep learning to enable systems like Alexa to move beyond spoken language understanding, to have a much broader view of what information can be inferred from an audio signal. We are working on acoustic event-detection challenges as we announced earlier this fall with Alexa Guard. When enabled, Alexa Guard can listen for specific sounds, such as glass breaking, or a smoke alarm going off, and send an alert to the customer's phone.With acoustic event detection, you process incoming audio streams and detect certain sound patterns in the signal. With deep learning technologies, the solution to these kinds of challenges is very straightforward. It’s a matter of collecting a lot of data, having humans annotate the data related to the output you want to predict, for example, whether they heard a glass-breaking sound or not, and then training the neural network to detect that sound event. Another opportunity for us is our ability to benefit from the work of so many people in academia and industry who are focused on optimizing deep-learning technologies: what’s the most effective neural network architecture for solving the problem, how to make the most of the data without overfitting, how to make models small and computationally efficient without sacrificing too much accuracy, etc. The state-of-the-art solutions for acoustic event detection and many other challenges are invariably based on deep learning technology. In some respects, the advancements in deep learning have turned all of our problems into nails because it’s such a powerful hammer.

Q. That takes us to the challenges. Do you consider what seems like a one-size-fits-all approach as a potential limitation?  

A. Yes. One of the issues is that, because we have such a powerful hammer, we as researchers are turning everything into nails without thinking more critically about the distinctive challenges of a problem and whether an alternative approach might work better. It’s easy to turn any problem into a data problem under the deep learning paradigm, but data is a big challenge in itself. To make our algorithms work really robustly in the real world, we need LOTS of data to solve the problem well. Sometimes data is difficult to come by.  For example, if you want to detect glass breaking where do you get the data? You can smash some windows to collect the data, but it takes a lot of data, both positive and negative samples, to ensure you have low false rejection and false alarm rates. Today, we need a lot of data, and on top of that, high-quality human annotation of that data is critical. We’re focusing on applying transfer- or unsupervised-learning techniques to address some of the data challenges. In fact, the team has some ICASSP 2019 submissions related to learning from unlabeled data, and techniques for handling “domain shift” caused by using different data sources.

Q. You’ll be attending SLT this year to keep up to date with research within your field, but also to meet with potential job candidates. What’s the profile of the kind of individual you are seeking to attract to your team to help it advance your work in audio event detection?

A. We have a lot of scientists on the team with diverse backgrounds: audio signal processing, sound source localization, speech recognition, computer vision, general machine learning. It’s important for us to attract a diverse set of talents so we can combine expertise from different fields in solving the problem. We need experts in speech processing who bring to acoustic event detection basic ASR techniques, such as dynamic cepstral mean normalization, an effective noise robustness technology. We have team members with computer vision backgrounds who bring different perspectives to the audio event detection challenge. In fact, we published a paper at Interspeech this year where a team member used a very effective algorithm for object detection in the computer vision field and applied it to audio event detection. We also have generalists who bring the latest advancements in machine learning to the acoustic event detection challenge. Additionally, we have a very active intern program. We host interns throughout the year and some of the interns have returned as full-time researchers after completing school. Our interns typically work on a promising research ideas that can lead to publication. What’s really great is that we also see the customer impact of our work. Alexa Guard is just one instance of an entire class of audio detection work we’re doing that hopefully will delight customers.

2018 SLT Papers

Learning Noise-Invariant Representations for Robust Speech Recognition Davis Liang, Zhiheng Huang, Zachary Lipton

Contextual Topic Modeling for Dialog Systems Chandra Khatri, Rahul Goel, Behnama Hedayatni, Angeliki Metanillou, Anushree Venkatesh, Raefer Gabriel, Arindam Mandal

Direct Optimization of F-Measure for Retrieval-Based Personal Question Answering Rasool Fakoor, Amanjit Kainth, Siamak Shakeri, Christopher Winestock, Abdel-Rahman Mohamed, Ruhi Sarikaya

Parsing Coordination for Spoken Language Understanding Sanchit Agarwal, Rahul Goel, Tagyoung Chung, Abhishek Sethi, Arindam Mandal, Spyros Matsoukas

A Re-Ranker Scheme for Integrating Large Scale NLU Models Chengwei Su, Rahul Gupta, Shankar Ananthakrishnan, Spyros Matsoukas

LSTM-based Whisper Detection Zeynab Raeesy, Kellen Gillespie, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Bjorn Hoffmeister

Parameter Generation Algorithms for Text-to- Speech Synthesis With Recurrent Neural Networks Viacheslav Klimkov, Alexis Moinet, Adam Nadolski, Thomas Drugman

Comprehensive Evaluation of Statistical Speech Waveform Synthesis  Thomas Merritt, Bartosz Putrycz, Adam Nadolski, Tianjun Ye, Daniel Korzekwa, Wiktor Dolecki, Thomas Drugman, Viacheslav Klimkov, Alexis Moinet, Andrew Breen, Rafal Kuklinski, Nikko Strom, Roberto Barra-Chicote


Blog Posts Related to SLT

Whisper to Alexa, and She’ll Whisper Back

Distributed “re-ranker” ensures that Alexa improvements reach customers ASAP

Context-Aware Deep-Learning Method Boosts Alexa Dialogue System’s Ability to Recognize Conversation Topics by 35%

Varying Speaking Styles with Neural Text-to-Speech

  New Approach to Language Modeling Reduces Speech Recognition Errors by Up to 15%

Connect with us at SLT!

If you would like to meet with us in person at the conference, please contact

Are you ready for your next opportunity? Check out our open positions on this page here, and learn more about the Alexa team here . We have global opportunities available, and speech and machine learning scientists will be available to meet at Interspeech.

Meet a few members from the Alexa Team at SLT

Alexa AI

AWS AI Summit 2018: Delivering on the Promise of AI Together

Alexa VP and Head Scientist Rohit Prasad focuses on AI advances that are delighting customers

AWS re:invent 2017: Alexa State of the Science

Alexa VP and Head Scientist Rohit Prasad presents the state of the science behind Amazon Alexa.

re:MARS 2019: AI Invention That Puts Customers First

Jeff Wilke, CEO, Worldwide Consumer, explains how Amazon has been using AI to develop great customer experiences for nearly 20 years.

"I spoke to the future and it listened" – Gizmodo

Meet the team of world-class scientists behind Alexa.

2018 Alexa Prize Finals

A look into the 2018 Alexa Prize Challenge

Washington Ideas 2017

Rohit Prasad, VP & Head Scientist, Alexa Machine Learning, talks about the future of Alexa & conversational AI with Alexis Madrigal

Find jobs in SLT 2018