Interested in managing platforms at massive scale that enable billions of dollars of revenue? Does working on a highly available platform doing almost 1 billion transactions per day sound like a fun challenge? Intuit is seeking a staff SRE to join Intuit Developer Platform SRE team. This team is core to Intuit's services journey, building platforms that maximize the speed with which Intuit and external developers can build awesome products and services.
We are looking for a type of person who has strong hands-on operational background and is passionate about managing platforms at scale, automating the tasks and is ready for an opportunity to tackle the complex problems of scale in AWS, containerization and traditional data centers. Person should be able to design, code, monitor and troubleshoot to maintain performance and availability of platforms. Creativity, excellent diagnostic skills, and a passion to resolve customer problems are a must.
Position is based out of San Diego and person should be willing to relocate to San Diego.
* Work closely with other SREs, developers, and other stakeholders to support all platforms and services managed by Intuit Developer Platform team. Person is responsible for ensuring that application design covers operational requirements of high availability, performance and all other resiliency patterns required for delightful customer experience are included and prioritized.
* Manage infrastructure (across AWS, containers and hosted data centers) as code
* Deliver an always-on (high available, scalable and performant platforms) operational excellence for our services
* Leverage development skills to deliver and deploy monitoring as code.
* Document the important processes and procedures, and performs required KT sessions to disseminate the information across the team.
* Troubleshooting complex issues, and manage stakeholders expectations during incidents while troubleshooting.
* Drive and own RCA for specific applications.
* Participate in 12/7 oncall rotations
* Overall 8+ years of experience in managing enterprise grade applications running on unix platforms in site reliability engineering role, and 3+ years of experience in AWS and containers.
* Proficiency with AWS technologies (NAT Gateway, EC2, ALB/NLB, Cloud watch, IAM, VPC, Route53 etc)
* Hands on experience with containerization and container orchestration.
* Strong experience in writing high quality Python or GO code
* Hands-on experience with Linux OS and strong understanding of unix internals
* Strong troubleshooting skills on enterprise grade application handling high volume TPS
* Strong experience in determining and implementing monitoring requirements.
* Hand-on experience to config management system (boto/terraform, chef/puppet/ansible etc)
* Solid communication skills: Demonstrated ability to explain complex technical issues to both technical and non-technical audiences
* "Self-starter"" attitude and ability to make decisions independently
Intuit is a company that provides business and financial management solutions for small businesses, consumers, and accounting professionals.