The Azure Site Reliability Engineering team is looking for an engineer with containers experience to join their team. You will be working directly with the Azure Kubernetes Service (AKS) team focusing on increasing quality, performance, and reliability of one of the fastest growing services in the history of Azure. AKS is a world-class container management and orchestration service for the cloud and beyond and we need software engineers who are excited about helping the service achieve and maintain high reliability as it scales across the globe.
The scale of our operations is enormous. Microsoft's products and services are overwhelmingly consumed online, and billions of people use them every day. We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all in the service of production reliability.
As SREs we are members of the Production Infrastructure Engineering (PIE) team and our vision is to make it easy for everyone to create, consume, and manage planetary-scale, reliable cloud production services and infrastructure to achieve more. As a team, we bring together significant and complementary capabilities with tooling, infrastructure, monitoring and insights in new ways to increase our perspective. Our diversity of knowledge and experience comes together for the benefit of our users, our colleagues, our business, and ourselves.
If you are excited by this type of challenge, and you love to work in groups of people who are similarly excited, come join us. We value the input of people who aren't afraid to be learning all the time, who celebrate mistakes because they show the way forward, and those who are happy to continuously improve. We strongly believe that diverse experiences and backgrounds, and an environment where everyone can feel safe to contribute their own insights in a data-driven, objective, but the supportive way is the key to making the best workplace possible, and the best workplace makes the best products and services. Not only is it the smart thing, it's the right thing.
* Bachelors in Computer Science or engineering or 5+ years of experience in software development.
* 3+ years of software development: automation-related experience valued in particular.
* 3+ years of experience using scripting languages such as bash, python, and PowerShell, or compiled languages such as C, C# and Go are most relevant, but others are acceptable.
* Experience with Golang, Terraform, and Prometheus
* Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes generally, microservices, and so on.
* Associated troubleshooting skills, including the ability to follow RPC call-chains across arbitrary network steps. Consequent understanding of monitoring in distributed systems.
* Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack; understanding of how applications are affected by the above, and ability to debug same.
* Experience with working in a team, including coordinating large projects, communicating well, and exercising initiative when presented with problems.
* Generally speaking, practical experience running large scale online systems is always an advantage.
* Experience in Linux
* Experience with Terraform or ansible
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
* Work with Kubernetes, Docker, and other container technologies
* Work with Azure Kubernetes Service team to increase reliability and maintainability at planetary scale
* Communicate effectively and partner well with other disciplines of the project team to deliver high quality solutions from envisioning to deployment to live site availability
* Write clean, well thought-out design and code with an emphasis on quality, simplicity, and maintainability, along with the ability to mentor others to do the same
* Drive and coach others through reviews of design, code, and test cases
* Design systems that prioritize the customer perspective and experience
* Understand and adapt new technologies, tools, methods, and processes from Microsoft and industry
* Influence the team for right design and technology implementation and give future architectural direction
* Drive architectural consolidation and simplification
* Role model Microsoft values through behaviors and actions, set an example and represent the Microsoft Values of leveraging others work and helping others be successful
Microsoft is a technology company that develops and supports software, services, and devices.