Amazon Web Services (AWS), the largest cloud provider in the world is looking for a systems engineer who is ready to take on a leadership role within the Elastic Beanstalk team and its mission to enable the easy creation and update of web applications and services. This person has strong technical skills, problem solving abilities, and communication skills to interface with internal team members and internal/external customers. We are looking for an individual who has the passion for engineering novel solutions to complex service challenges, as well as growing and developing their technical abilities in an operational environment. We are looking for an individual who will learn and use Amazon's operational tooling to grow and support the Elastic Beanstalk service.
The successful candidate will have a good mix of technical knowledge, and a good background in large scale service operations. The candidate should be open to new challenges, good at multi-tasking, and understand risk-based judgment.
An ideal candidate will be able to deliver on most of these responsibilities:
* · Experience running and maintaining a 24x7 Internet-oriented production environment, preferably across multiple data centers, involving (preferably) hundreds of machines.
* · Demonstrable expertise around specifying, designing, and/or implementing system health, performance monitoring tools, and software management tools for 24x7 environments.
* · A solid grasp of networking fundamentals, preferably including hands-on experience with load balancers, switches, routers, etc.
* · Familiar with the challenges surrounding efficient operations and failure mode analysis in large complex distributed systems.
* You will be expected to deliver on these kinds of things in the first six to twelve months on the job:
* · Through participation in all phases of the development of a large distributed system; providing hardware, manageability, and performance perspectives on all aspects of the system.
* · Define and/or refine hardware requirements and selected designs, balancing raw up-front dollar cost with operational needs and TCO, from the data center infrastructure up specify and participate in the development and delivery of features such as system health monitoring, diagnostics, repair, and other self-healing automation.
* · Develop or further existing application and system management tools and processes that reduce manual efforts and increase overall efficiency.
* · Adapt and improve operations management systems and processes to accommodate rapid and increasing growth in systems and traffic.
* · Maintain fleet inventory management, including producing, maintaining, and evolving capacity plans for various components.
* · Monitor the health of the fleet, automating system health, maintenance tasks, and reporting systems as needed.
* · Perform various system maintenance tasks including configuration of new machines.
* · Manage directly assigned tasks and on-call duties gracefully.
Amazon is a company operating a marketplace for consumers, sellers, and content creators.