282 open jobs
Give your vision for conversational AI a voice
Amazon Alexa is leading the way in making spoken language the next user interface. Alexa is the voice service that powers Amazon’s family of Echo products, Amazon Fire TV, and other third-party products. Echo is a device that you can talk to from across the room to play music, get the news, set timers, make hands-free calls, manage to-do and shopping lists, control lights, your thermostat and so much more.
Technologies We Focus On
The Alexa AI team contributes to the magic that is Alexa. Our goal is to make voice interfaces ubiquitous and as natural as speaking to a human. We have a relentless focus on the customer experience and customer feedback. We use many real-world data sources including customer interactions and a variety of cutting-edge techniques, like highly scalable deep learning, to train our speech models. Learning at this massive scale requires new research and development. The team is responsible for cutting-edge research and development in virtually all fields of human language technology. This WIRED article, which includes interviews with several Alexa scientists, provides good insight into our customer-centric approach to research and development, as does this interview with Rohit Prasad, vice president and head scientist, Amazon Alexa.
Alexa scientists and developers have significant impact on customer’s lives and are leading the industry in its shift toward conversational AI. Alexa scientists and engineers also invent new tools and APIs to accelerate development of voice services by empowering developers through the Alexa Skills Kit and the Alexa Voice Service. For example, developers can now create a new voice experience by simply providing a few sample sentences.
Our research is primarily customer focused. Your discoveries in speech recognition, natural language understanding, deep learning, and other disciplines of machine learning can fuel new ideas and applications that have direct impact on peoples’ lives. We also firmly believe that our team must engage deeply with the academic community and be part of the scientific discourse. There are many opportunities for presentations at internal Machine Learning conferences, which can be a springboard for publication at premier industry and academic conferences. We also partner with universities through the Alexa Prize.
We encourage the publication of research that will contribute to a future of more natural and engaging computing experiences. Research recently published by the Alexa science team is listed below.
- Parsing Coordination for Spoken Language Understanding, Sanchit Agarwal, Rahul Goel, Tagyoung Chung, Abhishek Sethi, Arindam Mandal, Spyros Matsoukas, SLT 2018
- Direct Optimization of F-Measure for Retrieval-Based Personal Question Answering, Rasool Fakoor, Amanjit Kainth, Siamak Shakeri, Christopher Winestock, Abdel-Rahman Mohamed, Ruhi Sarikaya, SLT 2018
- A Re-Ranker Scheme for Integrating Large Scale NLU Models, Chengwei Su, Rahul Gupta, Shankar Ananthakrishnan, Spyros Matsoukas, SLT 2018
- LSTM-Based Whisper Detection, Zeynab Raessy, Kellen Gillespie, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Bjorn Hoffmeister, SLT 2018
- Comprehensive Evaluation of Statistical Speech Waveform Synthesis, Thomas Merritt, Bartosz Putrycz, Adam Nadolski, Tianjun Ye, Daniel Korzekwa, Wiktor Dolecki, Thomas Drugman, Alexis Moinet, Andrew Breen, Rafal Kuklinski, Nikko Strom, Roberto Barra-Chicote, SLT 2018
- Contextual Topic Modeling for Dialog Systems, Chandra Khatri, Rahul Goel, Behnam Hedayatnia, Angeliki Metallinou, Anu Venkatesh, Raefer Gabriel, Arindam Mandal, SLT 2018
- Parameter Generation Algorithms for Text-to-Speech Synthesis with Recurrent Neural Networks, Viacheslav Klimkov, Alexis Moinet, Adam Nadolski, Thomas Drugman, SLT 2018
- Scalable Language Model Adaptation for Spoken Dialogue Systems, Ankur Gandhe, Ariya Rastrow, Björn Hoffmeister, SLT 2018
- Coupled Representation Learning for Domains, Intents and Slots in Spoken Language Understanding, Jihwan Lee, Dongchan Kim, Ruhi Sarikaya, Young-Bum Kim, SLT 2018
- Supervised Domain Enablement Attention for Personalized Domain Classification, Joo-Kyung Kim, Young-Bum Kim, EMNLP 2018
- Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Accept Rates, Joo-Kyung Kim and Young-Bum Kim, Interspeech 2018
- Detecting Media Sound Presence in Acoustic Scenes, Constantinos Papayiannis, Justice Amoh, Viktor Rozgic, Shiva Sundaram, Chao Wang, Interspeech 2018
- R-CRNN: Region-Based Convolutional Recurrent Neural Network for Audio Event Detection, Chieh-Chi Kao, Weiran Wang, Ming Sun, Chao Wang, Interspeech 2018
- A Simple Model for Detection of Rare Sound Events, Weiran Wang, Chieh-chi Kao, Chao Wang, Interspeech 2018
- Device-Directed Utterance Detection, Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Bjorn Hoffmeister, Interspeech 2018
- Design Challenges in Named Entity Transliteration, Yuval Merhav, Stephen Ash, COLING 2018
- Joint Learning of Domain Classification and Out-of-Domain Detection with Dynamic Class Weighting for Satisficing False Acceptance Rates, Joo-Kyung Jum and Young-Bum Kim, Interspeech 2018
- Contextual Language Model Adaptation for Conversational Agents, Anirudh Raju,Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh, Ariya Rastrow, Interspeech 2018
- Statistical Model Compression for Small-Footprint Natural Language Understanding, Grant Strimel, Kanthashree Mysore Sathyendra, Stanislav Peshterliev, Interspeech 2018
- Play Duration based User-Entity Affinity Modeling in Spoken Dialog System, Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan, Abishek Ravi, Interspeech 2018
- Contextual Slot Carryover for Disparate Schemas, Chetan Naik, Arpit Gupta, Hancheng Ge, Lambert Mathias, Ruhi Sarikaya, Interspeech 2018
- Fast and Scalable Expansion of Natural Language Understanding Functionality for Intelligent Agents, Anuj Goyal, Angeliki Metallinou, Spyros Matsoukas, NAACL 2018
- Selecting Machine-Translated Data for Quick Bootstrapping of a Natural Language Understanding System, Judith Gaspers, Penny Karanasou, Rajen Chatterjee, NAACL, 2018.
- The Alexa Meaning Representation Language, Thomas Kollar, Danielle Berry, Lauren Stuart, Karolina Owczarzak, Tagyoung Chung, Lambert Mathias, Michael Kayser, Bradford Snow, Spyros Matsoukas, NAACL, 2018.
- Efficient Large-Scale Domain Classification With Personalized Attention, Young-Bum Kim, Dongchan Kim, Anjishnu Kumar, Ruhi Sarikaya, ACL 2018
- A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding, Young-Bum Kim, Dongchan Kim, Joo-Kyung Kim, Ruhi Sarikaya, NAACL 2018
- Monophone-Based Background Modeling for Two-Stage On-Device Wake Word Detection, Minhua Wu, Sankaran Panchapagesan, Ming Sun, Jiacheng Gu, Ian Thomas, Shiv Naga Prasad Vitaladevuni, Bjorn Hoffmeister, Arindam Mandal, ICASSP 2018
- Time-Delayed Bottleneck Highway Networks Using A DFT Feature for Keyword Spotting, Jinxi Guo, Kenichi Kumatani, Ming Sun, Minhua Wu, Anirudh Raju, Nikko Strom, Arindam Mandal, ICASSP 2018
- Multilayer Adaptation-Based Complex Echo Cancellation and Voice Enhancement, Jun Yang, ICASSP 2018
- Combining Acoustic Embeddings and Decoding Features for End-of-Utterance Detection in Real-Time Far-Field Speech Recognition Systems, Roland Maas, Ariya Rastrow, Chengyuan Ma, Guitang Lan Kyle Goehner, Gautam Tiwari, Shaun Joseph, Bjorn Hoffmeister, ICASSP 2018
- Context Aware Conversational Understanding for Intelligent Agents with a Screen, Vishal Ishwar Naik, Angeliki Metallinou, Rahul Goel, AAAI, 2018.
- Multi-Task Learning For Parsing The Alexa Meaning Representation Language, Vittorio Perera, Tagyoung Chung, Thomas Kollar, Emma Strubell, AAAI, 2018.
- Conversational AI: The Science Behind the Alexa Prize, Ashwin Ram, Rohit Prasad, Chandra Khatri, Anu Venkatesh, Raefer Gabriel, Qing Liu, Jeff Nunn, Behnam Hedayatnia, Ming Cheng, Ashish Nagar, Eric King, Kate Bland, Amanda Wartick, Yi Pan, Han Song, Sk Jayadevan, Gene Hwang, Art Pettigrue, 2018.
- Efficient Large-Scale Domain Classification with Personalized Attention, Young-Bum Kim, Dongchan Kim, Anjishnu Kumar, Ruhi Sarikaya, 2018.
- Direct Modeling of Raw Audio with DNNS for Wake Word Detection, Kenichi Kumatani, Sankaran Panchapagesan, Minhua Wu, Minjae Kim, Nikko Strom, Gautam Tiwari, Arindam Mandal, ASRU, 2017.
- Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding, Anjishnu Kumar, Arpit Gupta, Julian Chan, Sam Tucker, Bjorn Hoffmeister, and Markus Dreyer, NIPS, 2017.
- On Evaluating and Comparing Conversational Agents, Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, and Anirudh Raju, NIPS, 2017.
- Topic-based Evaluation for Conversational Bots, Fenfei Guo, Angeliki Metallinou, Chandra Khatri, Anirudh Raju, Anu Venkatesh, and Ashwin Ram, NIPS, 2017.
- Learning Robust Dialog Policies in Noisy Environments, Maryam Fazel-Zarandi, Shang-Wen Li, Jin Cao, Jared Casale, David Whitney, and Alborz Geramifard, NIPS, 2017.
- Domain-Specific Utterance End-Point Detection for Speech Recognition, Roland Maas, Ariya Rastrow, Kyle Goehner, Gautam Tiwari, Shaun Joseph, Bjorn Hoffmeister, Interspeech, 2017.
- Zero-Shot Learning across Heterogenous Overlapping Domains,Anjishnu Kumar, Pavankumar Muddireddy, Markus Dreyer, Bjorn Hoffmeister, Interspeech, 2017.
- Robust Speech Recognition Via Anchor Word Representations, Brian King, I-Fan Chen, Yonatan Vaizman, Yuzong Liu, Roland Maas, SHK (Hari) Parthasarathi, Bjorn Hoffmeister, Interspeech, 2017.
- Robust online i-vectors for unsupervised adaptation of DNN acoustic models: A study in the context of digital voice assistants" Harish Arsikere, Sri Garimella, Interspeech, 2017.
- Compressed time delay neural network for small-footprint keyword spotting, Ming Sun, David Snyder, Yixin Gao, Varun Nagaraja, Mike Rodehorst, Sankaran Panchapagesan, Nikko Strom, Spyros Matsoukas, Shiv Vitaladevuni, Interspeech, 2017.
- Transfer Learning for Neural Semantic Parsing, Xing Fan, Emilio Monti, Lambert Mathias, and Markus Dreyer, ACL 2017 Workshop on Representation Learning for NLP.
- Deep Learning Based Automatic Volume Control And Limiter System, Jun Yang, Philip Hilmes, Brian Adair, David W. Krueger, ICASSP 2017
- Anchored Speech Detection, Roland Maas, Sree Hari Krishnan Parthasarathi, Brian King, Ruitong Huang, Bjorn Hoffmeister, Interspeech, 2016.
- Multi-task learning and Weighted Cross-entropy for DNN-based Keyword Spotting, Sankaran Panchapagesan, Ming Sun, Aparna Khare, Spyros Matsoukas, Arindam Mandal, Bjorn Hoffmeister, Shiv Vitaladevuni, Interspeech, 2016.
- LatticeRNN: Recurrent Neural Networks over Lattices, Faisal Ladhak, Ankur Gandhe, Markus Dreyer, Lambert Mathias, Ariya Rastrow, Bjorn Hoffmister, Interspeech, 2016.
- Optimizing Speech Recognition Evaluation Using Stratified Sampling, Janne Pylkkonen, Thomas Drugman, Max Bisani, Interspeech, 2016.
- Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models, Thomas Drugman, Janne Pylkkonen, Reinhard Kneser, Interspeech, 2016.
- Model Compression applied to small- footprint keyword spotting, George Tucker, Minhua Wu, Ming Sun, Sankaran Panchapagesan, Gengshen Fu, Shiv Vitaladevuni, Interspeech, 2016.
- Search-based Evaluation from Truth Transcripts for Voice Search Applications, Francois Mairesse, Paul Raccuglia, Shiv Vitaladevuni, SIGIR, 2016.
- Robust i-vector based Adaptation of DNN Acoustic Model for Speech Recognition, Sri Garimella, Arindam Mandal, Nikko Strom, Bjorn Hoffmeister, Spyros Matsoukas, Sree Hari Krishnan Parthasarathi, Interspeech, 2015.
- Scalable Distributed DNN Training Using Commodity GPU Cloud Computing, Nikko Strom, Interspeech, 2015.
- fMLLR based feature-space speaker adaptation of DNN acoustic models, Sree Hari Krishnan Parthasarathi, Bjorn Hoffmeister, Spyros Matsoukas, Arindam Mandal, Nikko Ström, Sri Garimella, Interspeech, 2015.
- Accurate Endpointing with Expected Pause Duration, Baiyang Liu, Bjorn Hoffmeister, Airya Rastrow, Interspeech, 2015.
Do you want to give your vision for conversational AI a voice? If so, here are some hints on how you can join our team. Please check out our open positions below, ranging from speech and machine-learning scientist, to language data specialist and technical program manager. We have hundreds of opportunities available in the following global locations:
Meet Amazonians working in Alexa AI
AWS AI Summit 2018: Delivering on the Promise of AI Together
Alexa VP and Head Scientist Rohit Prasad focuses on AI advances that are delighting customers
AWS re:invent 2017: Alexa State of the Science
Alexa VP and Head Scientist Rohit Prasad presents the state of the science behind Amazon Alexa.
AWS re:invent 2017: Alexa State of the Union
Alexa SVP Tom Taylor covers the state of the Alexa business, some early challenges, and how we are approaching emerging trends.
"I spoke to the future and it listened" – Gizmodo
Meet the team of world-class scientists behind Alexa.
2016 MobileBeat Conference Interview
Alexa Head Scientist Rohit Prasad's interview at VentureBeat's 2016 MobileBeat Conference
Washington Ideas 2017
Rohit Prasad, VP & Head Scientist, Alexa Machine Learning, talks about the future of Alexa & conversational AI with Alexis Madrigal