As a Systems Development Engineer in AWS Safety Engineering, you will work on large-scale initiatives that automate the incident and problem management process for all of AWS. You will design and implement systems which automate fault containment, problem diagnosis, and issue resolution across multiple hugely-distributed, always-on architectures. These systems will take metric and dependency data from multiple sources and analyze them, correlating them with customer impact to determine root cause of an issue without human intervention. As the scale and complexity of AWS grows, this is the best way that we can offer our customers a stable and reliable cloud computing platform. We succeed once these systems detect, diagnose, and repair operational defects without customer impact or human intervention.
What You Will Do:
* Play a significant role in building new systems and solutions * Drive an environment of continuous improvement and world-class efficiency * Work cross-functionally with other teams to continually improve operational readiness and availability * Anticipate bottlenecks, make trade-offs, and encourage risk-taking to maximize business benefit * Evaluate and recommend new and emerging products and technologies * Drive operation excellence in your organization.
You will work with teams across AWS to drive adoption of the solutions built by the team, and influence systems development practices for new and existing products. You will define availability goals for service teams across AWS, and strategies to make these goals attainable with minimal effort. Your goal will be to remove human-error from the day-to-day operations of the massive, always-on, distributed systems which make up AWS.
Amazon is a company operating a marketplace for consumers, sellers, and content creators.