At Flexera SRE is responsible for the reliability of our SaaS offerings. This team works with product development to define our Service Level Objectives and performs the work required to ensure we meet those SLOs. These teams employ agile and lean principles in a culture of constant learning and improving.
As an SRE you will be tasked with everything from helping with product design, to diagnosing issues, and writing automated scripts for mediating issues that occur in our production systems. You will be driven to build fault tolerant, scalable systems and automate away as much operational toil as you can.
* Help to eliminate operational toil - seek to automate repetitive operations work * Work with product development teams to ensure that our new features are able to meet SLAs * Help mature the delivery process for teams; defining Jenkins pipelines, designing canary release deploys, building in automated fallbacks or optimizing the build chain, you help craft the appropriate solution for the product * Optimize product service code to ensure that it's secure, scalable and performant * Optimize testing capabilities to increase the assurances we have with each release * Improve the fault detection for our services * Create dashboards which help communicate the metrics for a given product service * Work with product owners and product engineering teams to perform capacity planning * Work with product engineering teams to understand performance and behavior patterns * Be part of an on-call rotation for alerts that require engineering expertise to diagnose * Help carry out root cause analysis for incidents, and design solutions (both software and human processes) that will help to ensure the same problem doesn't happen in the same way again
* Computer Science degree, or related industry experience managing a mission critical production system for at least 2 years
Critical Skills / Competencies:
* A positive attitude and willingness to learn * Expertise in one or more of the following languages: Python / Go / Java / C# / C / C++ * A solid understanding of data structures and algorithms * Experience with IaaS and Serverless services from a cloud provider * A strong understanding in TCP/IP, DNS and experience designing networks * Linux system administration experience * Strong conflict resolution competence * Excellent written and verbal communication skills * Experience implementing fault detection, and automating fixes * Experience designing scalable services * Experience designing distributed, fault-tolerant systems * A good understanding of SQL databases * Detail oriented. The ideal candidate is one who naturally digs as deep as they need to understand the why
The following list of items are not pre-requisites for the role, but might give you a bit more of an idea about what you may expect to come across in your SRE role at Flexera: