Amazon strives to be the world's most customer centric company. To succeed, our products and services must be available at all times to our customers.
Within Amazon we have an entire organization dedicated to the availability of our shopping experiences worldwide named Consumer Reliability Engineering, and we are hiring.
We are responsible for the global availability of the Amazon retail shopping experiences. Ensuring a highly available experience is a massive challenge across 26 marketplaces' websites & mobile-apps, powered by tens of thousands of backend services. Multiply this width of scope with the depth of complexity introduced by the diversity of those services' implementation choices, consumption of AWS' services, frequent software updates, new feature launches, and you begin to get the picture.
To support the growth of complexity while strengthening our culture of resilient software engineering across all of Amazon's consumer SDEs, we are creating a chaos engineering function in New York to ensure that application software, supporting services, and underlying infrastructure components are resilient to failures. We will create chaos experiments at all levels of granularity, from impacting hosting platforms on which services are running, to introducing latency in system-to-system interactions, to creating a complete loss of a significant portion of our architecture. The learnings from these experiments will drive the improvement of the software owned and operated by thousands of developers and provide guidance to our AWS partners. To accommodate the growing scale and complexity of Amazon, this work simply cannot be done by testers; only large-scale distributed solutions utilizing machine-learned insights of behavioral characteristics have a chance of coping with this challenge.
As a software development engineer in this space, you will love the fast-paced, startup-like environment focused on building systems from the ground up that enable the execution and wide-scale coordination of chaos experiments; aggregation and learning from results; and the definition and execution of architectural improvements across all our software development groups. You will be responsible for scoping and delivering projects end-to-end, leveraging statistical evaluation, pattern recognition, and machine learning. Your group will deliver solutions that protect Amazon's services by proactively proving that the complex service graph powering Amazon retail websites globally are resilient against anomalous conditions created by failures, unexpected customer behavior, and even attackers.
The ideal candidate will have a proven track record of shipping complex software solutions through an agile methodology. You will have the ability to dive deep into a wide variety of problems and technologies to guide the right technical decisions for the products and the businesses you will support. You will bring multiple years of devops experience from both owning and operating solutions of scale. You will be a strong communicator and will have proven abilities in both architectural and software solutioning.
Amazon is an Equal Opportunity-Affirmative Action Employer - Minority / Female / Disability / Veteran / Gender Identity / Sexual Orientation
Amazon is a company operating a marketplace for consumers, sellers, and content creators.