AWS customers have come to rely on our track record of stellar operational performance. Our team ensures that the quickly expanding number and scale of services are able to deliver that reliability promise.
Amazon Web Services is seeking a strong engineer to help drive and implement the long-term technical vision of our world-class Safety Engineering group. We analyze trends and build software used across AWS to reduce recurrence, duration, and size of customer impacting events.
As a Software Developer Engineer in AWS Safety Engineering, you will work on large-scale initiatives that automate the incident and problem management process for all of AWS. You will design and implement systems which automate fault containment, problem diagnosis, and issue resolution across multiple hugely-distributed, always-on architectures. These systems will take metric and dependency data from multiple sources and analyze them, correlating them with customer impact to determine root cause of an issue without human intervention. As the scale and complexity of AWS grows, this is the best way that we can offer our customers a stable and reliable cloud computing platform. We succeed once these systems detect, diagnose, and repair operational defects without customer impact or human intervention.
What You Will Do:
* Play a significant role in building new software and services * Drive an environment of continuous improvement and world-class efficiency * Work cross-functionally with other teams to continually improve operational readiness and availability * Anticipate bottlenecks, make trade-offs, and encourage risk-taking to maximize business benefit * Evaluate and recommend new and emerging products and technologies * Drive operation excellence in your organization.
You will work with teams across AWS to drive adoption of the software built by the team, and influence systems development practices for new and existing products. You will define availability goals for service teams across AWS, and strategies to make these goals attainable with minimal effort. Your goal will be to remove human-error from the day-to-day operations of the massive, always-on, distributed systems which make up AWS.
Amazon is a company operating a marketplace for consumers, sellers, and content creators.