The Azure Site Reliability Engineering (SRE) team is looking for an Engineer with containers experience to join their team. You will be working directly with the Azure Resource Manager (ARM) team focusing on increasing quality, performance, and reliability of one of the most essential services that enables Azure to scale. ARM is currently running the default standard set of APIs that empower automation and resource groups in Azure. The platform is an essential piece of how workloads on azure reach high scale.
The scale of our operations is enormous. Microsoft's products and services are overwhelmingly consumed online, and billions of people use them every day. We need people who enjoy analyzing complicated problems, coming up with creative solutions, working in focused teams to build things no-one has thought of before, all with the focus of production reliability.
SREs are people who take engineering-based approaches to solve operations problems: we like infrastructure, we like seeing how big complicated things work, and most importantly, we gain great satisfaction from making it better. We have backgrounds in lots of things -- of course, Computer Science, System Administration, Networking, Mathematics, and Engineering generally, but you can also find folks who've worked in Physics, Chemistry, Biology, Statistics, and even English.
As SREs we are members of the Production Infrastructure Engineering (PIE) team and our vision is to make it easy for everyone to create, consume, and manage planetary-scale, reliable cloud production services and infrastructure to achieve more. As a team, we bring together significant and complementary capabilities with tooling, infrastructure, monitoring and insights in new ways to increase our perspective. Our diversity of knowledge and experience comes together for the benefit of our users, our colleagues, our business, and ourselves.
If you are excited by this type of challenge, and you love to work in groups of people who are similarly excited, come join us. We value the input of people who aren't afraid to be learning all the time, who celebrate mistakes because they show the way forward, and those who are happy to continuously improve. We strongly believe that diverse experiences and backgrounds, and an environment where everyone can feel safe to contribute their own insights in a data-driven, objective, but the supportive way is the key to making the best workplace possible, and the best workplace makes the best products and services. Not only is it the smart thing, it's the right thing.
* Bachelors in Computer Science / Engineering or 7+ years of experience in software development.
* 3+ years of software development: automation-related experience valued in particular.
* 3+ years of experience using scripting languages such as bash, python, and PowerShell, or compiled languages such as C, C# are most relevant, but others are acceptable
* 2+ years of distributed system monitoring and telemetry implementation.
* Awareness of, and ability to reason about, modern software & systems architectures, including load-balancing, queueing, caching, distributed systems failure modes generally, microservices, and so on.
* Associated troubleshooting skills, including the ability to follow RPC call-chains across arbitrary network steps. Consequent understanding of monitoring in distributed systems.
* Deep understanding of operating system level concepts such as processes, memory allocation, and the network stack; understanding of how applications are affected by the above, and ability to debug same.
* Experience with working in a team, including coordinating large projects, communicating well, and exercising initiative when presented with problems.
* Generally speaking, practical experience running large scale online systems is always an advantage.
* Experience in Linux
* Ability to analyze, understand, and solve complex problems by leveraging technology.
* Azure Resource Manager or Similar API driven cloud scale system knowledge a plus
* Willingness and ability to respectfully challenge the status quo
* Able to handle high ambiguity and to drive partnerships and clarity.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
* Work with Azure Resource Manager team to increase reliability and maintainability at planetary scale
* Work across Azure Production Infrastructure Engineering to drive tools that help deliver insights and automation to simplify the complex world of planetary scale.
* Communicate effectively and partner well with other disciplines of the project team to deliver high quality solutions from envisioning to deployment to live site availability
* Write clean, well thought-out design and code with an emphasis on quality, simplicity, and maintainability, along with the ability to mentor others to do the same
* Drive and coach others through reviews of design, code, and test cases
* Design systems that prioritize the customer perspective and experience
* Understand and adapt new technologies, tools, methods, and processes from Microsoft and industry
* Influence the team for right design and technology implementation and give future architectural direction
* Drive architectural consolidation and simplification
* Role model Microsoft values through behaviors and actions, set an example and represent the Microsoft Values of leveraging others work and helping others be successful
Microsoft develops, licenses, and supports software, services, devices, and solutions.