Skip to main content

System Development Engineer, AWS Reliability Engineering, RBE Toronto

Job ID: 1869662 | Amazon Dev Centre Canada ULC


Job summary
Be a pioneer for the AWS Region Build Reliability Engineering capability as AWS expands our new region builds. Bring your software and systems engineering expertise, and your passion for innovation and continuous improvement to help us improve efficiency, quality, and reliability for services during region builds. The impact of your work will reduce the time to deploy new regions to our customers while simultaneously increasing builder productivity for service teams. Through this experience, you will apply both your breadth and depth of cloud and reliability engineering expertise across the vast landscape of critical AWS services to materially improve how AWS delivers regions across the globe.

As a Systems Development Engineer in AWS Region Build Reliability Engineering, you will work across new region build initiatives to implement observability to proactively detect service issues. You will design and implement systems which automate alerting, problem diagnosis, and self-healing across multiple in-flight region builds. These systems will synthesize data from multiple sources, determine failure patterns, and apply automation for remediation. These systematic approaches will enhance region stability and reliability, even as the scale and complexity of AWS continues to increase.

We succeed when these systems can detect, diagnose, and repair operational defects without impact on region build or human intervention.

What You Will Do:
· Play a significant role in building new systems and solutions to support region reliability
· Drive an environment of continuous improvement and efficiency
· Work cross-functionally with service teams to continually improve region readiness and availability
· Anticipate bottlenecks, make trade-offs, and encourage innovative to maximize business benefit
· Evaluate and recommend new and emerging products and technologies
· Drive operational excellence with the aim of reducing MTTD, MTTR, and reducing incidents and issues
You will work with teams across AWS to drive adoption of the solutions built by the team, and influence systems development practices for new and existing products. You will define region build availability/health goals for service teams across AWS, and strategies to make these goals attainable with minimal effort.

Our team also puts a high value on work-life balance. Striking a healthy balance between your personal and professional life is crucial to your happiness and success here, which is why we aren’t focused on how many hours you spend at work or online. Instead, we’re happy to offer a flexible schedule so you can have a more productive and well-balanced life—both in and outside of work.

Here at AWS, we embrace our differences. We are committed to furthering our culture of inclusion. We have ten employee-led affinity groups, reaching 40,000 employees in over 190 chapters globally. We have innovative benefit offerings, and we host annual and ongoing learning experiences, including our Conversations on Race and Ethnicity (CORE) and AmazeCon (gender diversity) conferences. Amazon’s culture of inclusion is reinforced within our 14 Leadership Principles, which remind team members to seek diverse perspectives, learn and be curious, and earn trust.

We are dedicated to supporting our new team members. Our team has a broad mix of experience levels and Amazon tenures, and we’re building an environment that celebrates knowledge sharing and mentorship.


· Programming experience with at least one modern language such as C++, C#, Java, Python, Golang, PowerShell, Ruby
· 2+ years of non-internship professional software development experience
· 1+ years of experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems.
· Bachelors or Masters in Computer Science, Engineering, Information Technology or related field or 4 years engineering experience in lieu of degree
· 2+ years with Linux or similar UNIX distributions (RHEL, CentOS, Debian, etc)


· Masters in Computer Science, Engineering, Information Technology or related field
· Able to work in a diverse team
· Strong sense of ownership and drive
· Excellent written and verbal communication, analytical and collaborative problem-solving skills
· Experience in Systems Administration, DevOps or Site Reliability Engineering
· Experience specifying, designing, and/or implementing system health and performance monitoring tools
· Experience designing and/or implementing automated software testing, deployment and performance analysis systems
· Experience conducting failure mode analysis in complex distributed systems
· Experience conducting efficiency and duplication analysis across large organizations
· Experience reviewing and refining design and architecture documents presented by partner teams for operational readiness, fault tolerance and scalability
· Experience developing or furthering existing application and system management tools and processes that reduce manual efforts and increase overall efficiency
· Ability to adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic
· Experience monitoring the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed
· Experience with very large distributed systems such as large scale distributed database systems, storage farms, and/or horizontally scaled request processing fleets
· Meets/exceeds Amazon’s leadership principles requirements for this role
· Meets/exceeds Amazon’s functional/technical depth and complexity for this role
*Please email AWS Sourcing Recruiter, Krystan Silva ( if you have questions.

Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit

**For more information on Amazon Web Services, please visit**

This role will sit in our new headquarters in Northern Virginia, where Amazon will invest $2.5 billion dollars, occupy 4 million square feet of energy efficient office space, and create at least 25,000 new full-time jobs. Our employees and the neighboring community will also benefit from the associated investments from the Commonwealth including infrastructure updates, public transportation improvements, and new access to Reagan National Airport.

By working together on behalf of our customers, we are building the future one innovative product, service, and idea at a time. Are you ready to embrace the challenge? Come build the future with us.

Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, disability, age, or other legally protected status. If you would like to request an accommodation, please notify your Recruiter.