Job Directory Principal Site Reliability Engineer - Cloud Infrastructure

Principal Site Reliability Engineer - Cloud Infrastructure
Plano, TX

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.


Job Description

Join us as we pursue our disruptive vision to make machine data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we're committed to our work, customers, having fun and most meaningfully to each other's success. Learn more about Splunk careers and how you can become a part of our journey!


Splunk's Cloud group is looking for an expert Principal SRE to help lead, design and build the next generation of our large scale Cloud offering. You will be working on the core compute platforms in the cloud.

You will:

* Work across the organization to deliver quality products that delight Splunk's passionate users.
* Lead teams of tight-knit, super smart engineers who are building a state-of-the-art, cloud-based environment for massive-scale data processing.
* Mentor and help new engineers to achieve more than they thought possible.


* You are passionate about building and running distributed systems at scale in production. You understand the challenges and trade-offs to be made when building and deploying systems to production.
* Expertise in working with container deployment and orchestration technologies at scale with strong knowledge of the fundamentals to include service discovery, deployments, monitoring, scheduling, load balancing. Knowledge of Kubernetes, Go and Docker preferred.
* Deep understanding of Systems programming (network stack, file system, OS services) and networking (L2 vs. L3, network architecture, VLANs, etc)
* Knowledge of best practices related to security, performance, and disaster recovery.
* Highly skilled in identifying performance bottlenecks, identifying anomalous system behavior, and determining the root cause of incidents.
* You've demonstrated the ability to effectively work collaboratively across functions.
* You are enthusiastic about making the many users of your product happier every day.

Preferred skills:

* Experience with running multi cluster environments and strong understanding of multi-tenancy and security implications.
* Experience with development and deployment in a hosted cloud environment, preferably AWS.

We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

For job positions in San Francisco, CA, and other locations where required, we will consider for employment qualified applicants with arrest and conviction records.

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.