Amazon Enterprise Risk and Management (ERMC) team is looking for a System Development Engineer to build systems and automation. You will support critical Risk and Compliance business functions for customers across the world while meeting high up-time SLAs and ensuring robust system performance. You will discover innovative ways to automate and scale our infrastructure as we expand globally. You will work together with multiple teams and departments, with plenty of opportunities to learn and grow.
You're perfect if you possess that rare mix of depth of Development, Systems Engineering, and Customer Obsession. You're right for the job if you're comfortable with system admin, networking and maintain highly available clustered environments. You'll excel if you have enthusiasm for digging deep and a flare for sharp technical communication, prioritization and organization. In addition to providing top-tier build outs and maintenance of systems and infrastructure, System Development Engineers are expected to develop best practices, refine operational procedure and constantly think pro-actively and with innovation.
You should have demonstrated the following:
* Expertise in specifying, designing, implementing entire software stacks, including application and systems layers, with health, performance monitoring and management tools for 24x7 environments.
* Experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) software defined infrastructure.
* Experience with the challenges surrounding efficient operations and failure mode analysis in large complex distributed systems.
You will be expected to deliver on these kinds of things in the first six to twelve months on the job:
* Develop or further existing infrastructure and application stacks, develop tools and processes that reduce manual efforts and increase overall efficiency.
* Contribute to all phases of the development of a large distributed system, provide hardware, manageability, operability and performance perspectives on all aspects of newly architected platform and upstream/downstream systems
* In the development and delivery of operability-related features such as system health monitoring, diagnostics, repair, and other self-healing automation.
* Adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic.
* Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed.
* Perform various system and application maintenance tasks.
* Manage directly assigned tasks and on-call duties gracefully.
Amazon is a company operating a marketplace for consumers, sellers, and content creators.