Alexa Machine Learning
Give your vision for conversational computing a voice
What is Alexa?
Amazon Alexa is leading the way in making spoken language the next user interface. Alexa is the voice service that powers Amazon’s family of Echo products, Amazon Fire TV, and other third-party products. Echo is a device that you can talk to from across the room to play music, get the news, set timers, make hands-free calls, manage to-do and shopping lists, control lights, your thermostat and so much more.
Technologies We Focus On
The Alexa Science and Machine Learning team contributes to the magic that is Alexa. Our goal is to make voice interfaces ubiquitous and as natural as speaking to a human. We have a relentless focus on the customer experience and customer feedback. We use many real-world data sources including customer interactions and a variety of cutting-edge techniques, like highly scalable deep learning, to train our speech models. Learning at this massive scale requires new research and development. The team is responsible for cutting-edge research and development in virtually all fields of human language technology: automatic speech recognition (ASR), artificial intelligence (AI), natural language understanding (NLU), question answering, dialog management, and text-to-speech (TTS). This interview with VP and head scientist Rohit Prasad provides good insight into our customer-centric approach to research and development.
Alexa scientists and developers have significant impact on customer’s lives and are leading the industry in its shift toward conversational computing. Alexa scientists and engineers also invent new tools and APIs to accelerate development of voice services by empowering developers through the Alexa Skills Kit and the Alexa Voice Service. For example, developers can now create a new voice experience by simply providing a few sample sentences.
Our research is primarily customer focused. Your discoveries in speech recognition, natural language understanding, deep learning, and other disciplines of machine learning can fuel new ideas and applications that have direct impact on peoples’ lives. We also firmly believe that our team must engage deeply with the academic community and be part of the scientific discourse. There are many opportunities for presentations at internal Machine Learning conferences, which can be a springboard for publication at premier industry and academic conferences. We also partner with universities through the Alexa Prize.
We encourage the publication of research that will contribute to a future of more natural and engaging computing experiences. Research recently published by the Alexa science team is listed below.
- "Direct Modeling of Raw Audio with DNNS for Wake Word Detection," Kenichi Kumatani, Sankaran Panchapagesan, Minhua Wu, Minjae Kim, Nikko Strom, Gautam Tiwari, Arindam Mandal, ASRU, 2017
- "Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding," Anjishnu Kumar, Arpit Gupta, Julian Chan, Sam Tucker, Bjorn Hoffmeister, and Markus Dreyer, NIPS, 2017
- "On Evaluating and Comparing Conversational Agents," Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, and Anirudh Raju, NIPS, 2017
- "Topic-based Evaluation for Conversational Bots," Fenfei Guo, Angeliki Metallinou, Chandra Khatri, Anirudh Raju, Anu Venkatesh, and Ashwin Ram, NIPS, 2017
- "Learning Robust Dialog Policies in Noisy Environments," Maryam Fazel-Zarandi, Shang-Wen Li, Jin Cao, Jared Casale, David Whitney, and Alborz Geramifard, NIPS, 2017
- "Domain-Specific Utterance End-Point Detection for Speech Recognition,” Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Bjorn Hoffmeister, Interspeech, 2017.
- "Zero-Shot Learning across Heterogenous Overlapping Domains,"Anjishnu Kumar, Pavankumar Muddireddy, Markus Dreyer, Bjorn Hoffmeister, Interspeech, 2017.
- "Robust Speech Recognition Via Anchor Word Representations," Brian King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, SHK (Hari) Parthasarathi, Bjorn Hoffmeister, Interspeech, 2017.
- "Robust online i-vectors for unsupervised adaptation of DNN acoustic models: A study in the context of digital voice assistants" Harish Arsikere, Sri Garimella, Interspeech, 2017.
- "Compressed time delay neural network for small-footprint keyword spotting," Ming Sun, David Snyder, Yixin Gao, Varun Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni, Interspeech, 2017.
- "Transfer Learning for Neural Semantic Parsing," Xing Fan, Emilio Monti, Lambert Mathias, and Markus Dreyer, ACL 2017 Workshop on Representation Learning for NLP.
- Anchored Speech Detection, Roland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Bjorn Hoffmeister, Interspeech, 2016.
- Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting, Sankaran Panchapagesan, Ming Sun, Aparna Khare, Spyros Matsoukas, Arindam Mandal, Bjorn Hoffmeister, Shiv Vitaladevuni, Interspeech, 2016.
- LatticeRNN: Recurrent Neural Networks over Lattices, Faisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Bjorn Hoffmister, Interspeech, 2016.
- Optimizing Speech Recognition Evaluation Using Stratified Sampling, Janne Pylkkonen, Thomas Drugman, Max Bisani, Interspeech, 2016.
- Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models, Thomas Drugman, Janne Pylkkonen, Reinhard Kneser, Interspeech, 2016.
- Model Compression applied to small- footprint keyword spotting, George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni, Interspeech, 2016.
- Search-based Evaluation from Truth Transcripts for Voice Search Applications, Francois Mairesse, Paul Raccuglia, Shiv Vitaladevuni, SIGIR, 2016.
- Robust i-vector based Adaptation of DNN Acoustic Model for Speech Recognition, Sri Garimella, Arindam Mandal, Nikko Strom, Bjorn Hoffmeister, Spyros Matsoukas, Sree Hari Krishnan Parthasarathi, Interspeech, 2015.
- Scalable Distributed DNN Training Using Commodity GPU Cloud Computing, Nikko Strom, Interspeech, 2015.
- fMLLR based feature-space speaker adaptation of DNN acoustic models, Sree Hari Krishnan Parthasarathi, Bjorn Hoffmeister, Spyros Matsoukas, Arindam Mandal, Nikko Ström, Sri Garimella, Interspeech, 2015.
- Accurate Endpointing with Expected Pause Duration, Baiyang Liu, Bjorn Hoffmeister, Airya Rastrow, Interspeech, 2015.
Do you want to give your vision for conversational computing a voice? If so, here are some hints on how you can join our team. Please check out our open positions below, ranging from speech and machine-learning scientist, to language data specialist and technical program manager. We have hundreds of opportunities available in the following global locations:
Meet Amazonians working in Alexa Machine Learning
Alexa Machine Learning & Science
AWS re:invent 2017: Alexa State of the Science
Alexa VP and Head Scientist Rohit Prasad presents the state of the science behind Amazon Alexa.
AWS re:invent 2017: Alexa State of the Union
Alexa SVP Tom Taylor covers the state of the Alexa business, some early challenges, and how we are approaching emerging trends.
"I spoke to the future and it listened" - Gizmodo
Meet the team of world-class scientists behind Alexa.
Introducing the Alexa Prize
The Alexa Prize is an annual competition for university students dedicated to accelerating the field of conversational AI.
2016 MobileBeat Conference Interview
Alexa Head Scientist Rohit Prasad's interview at VentureBeat's 2016 MobileBeat Conference
Keynote: Conversational AI in Amazon Alexa
A talk by Senior Manager, AI Science Ashwin Ram at Udacity Intersect 2017