Excited by finding ways to combine Machine Learning (ML) and Human Intelligence to solve problems that ML alone can’t solve?
Amazon Mechanical Turk is a crowdsourcing marketplace that enables individuals or businesses to use human intelligence to complete jobs that humans can do better than computers, providing access to an on-demand, scalable, workforce. We connect startups, enterprises, researchers, well-known tech companies, and government agencies with individuals to solve problems in computer vision, machine learning, natural language processing, and more. Through Amazon Sagemaker Ground Truth and Amazon Augmented AI (A2I), we integrate seamlessly into ML workflows, to help ML scientists automate the process of labeling data for ML model training, or to make inferences though human intelligence when an ML model has insufficient confidence to do so.
As a Senior Data Scientist on the team, you will apply your strong background in ML, statistics and data engineering to devise new ways to combine machine and human intelligence, and to derive new insights into the functioning of one of the world’s largest crowdsourcing marketplaces. You will be a scrappy self-starter, proficient in applying ML methods to generate predictions from complex, noisy, and semi- or un-structured datasets, and building robust pipelines to process, transform and analyze data in near-real-time. Where the data doesn't yet exist, you will be someone who can develop ways to obtain it and integrate it. You will be comfortable combining data engineering with modeling and analysis, to build out the foundation of data you can use to generate more complex insights. You will be as capable of writing documents, influencing an organization and leading a team as you are of digging into the details and executing as an individual. You will thrive in ambiguous and undefined problem spaces where you can largely own and define the problem and the solution, and where you can iterate through experiments and proofs of concept to develop key insights to inform our product.
· Master’s degree in a highly quantitative field (Computer Science, Machine Learning, Operational Research, Statistics, Mathematics, etc.), or equivalent experience
· 4+ years of industry experience in Machine Learning, Predictive Modeling, Data Science, Statistical Analysis, or similar fields
· Track record of applying the scientific method to solve problems by forming hypotheses and testing them through well-documented and reproducible experiments
· Proficient in Python, with the ability to write clear, concise, explainable, testable, production-ready code
· Experience with ML libraries (scikit-learn) and with big-data distributed systems (Hadoop, Hive, MapReduce, Spark, SparkML, Dask)
· Demonstrated expertise in more than one data science discipline (statistics/econometrics, time series forecasting, field experiments, machine learning, deep learning, Bayesian methods, stochastic modeling, operations research)
· Fluent with SQL and with relational database systems and concepts
· Strong written and verbal communication and organizational skills
· 6+ years of industry experience building analytic / ML applications
· Experience with ETL pipeline tools like Airflow, and with code version control systems like Git
· Experience with AWS technologies like Redshift, S3, EC2, Glue, EMR, Kinesis, Lambda, Step Functions, IAM roles and permissions, and CloudFormation
· Knowledge of batch and streaming data architectures
· Familiarity with non-relational databases / data stores (object storage, document or key-value stores, graph databases, column-family databases)
· Experience with unstructured data annotation and human computation
· Demonstrable track record of dealing well with ambiguity, and prioritizing conflicting needs