Job Directory Site Reliability Engineer (SRE)

Site Reliability Engineer (SRE)
San Francisco, CA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.


Job Description

201 Third Street (61049), United States of America, San Francisco, California

At Capital One, we're building a leading information-based technology company. Still founder-led by Chairman and Chief Executive Officer Richard Fairbank, Capital One is on a mission to help our customers succeed by bringing ingenuity, simplicity, and humanity to banking. We measure our efforts by the success our customers enjoy and the advocacy they exhibit. We are succeeding because they are succeeding.

Guided by our shared values, we thrive in an environment where collaboration and openness are valued. We believe that innovation is powered by perspective and that teamwork and respect for each other lead to superior results. We elevate each other and obsess about doing the right thing. Our associates serve with humility and a deep respect for their responsibility in helping our customers achieve their goals and realize their dreams. Together, we are on a quest to change banking for good.

Site Reliability Engineer (SRE)

Are you….

A passionate SRE, who is excited by driving large-scale, mission-critical, data systems? Are you an innovative engineer who is driven by the best service you can provide to your customers? Are you excited about the technology you can leverage to build systems that "just run"? Do you see the big picture and at the same time, can iterate on small steps to drive constant growth and improvements? If the answer is yes to the above, then Capital One is the place for you!

The Team

As our customers go about their daily lives, they generate a river of data. The data river flows 24 hours a day, touches all lines of business and represents a tremendous opportunity for Capital One. Using that data, we develop and operate real-time, large scale and business critical decision systems. These systems provide anyone at Capital One the resources and infrastructure they need to innovate and deliver.

We are the Data team and we are chartered with redefining how everyone at Capital One produces and consumes data. We are using open source and cutting-edge data streaming technologies to build custom experiences that humanize the creation and consumption of data. To continue our rapid growth, we are looking for a driven SRE to join the team!

Our team's philosophy is simple - you build it, you own it (YBYO). That thinking is at the core of how we approach everything we do. We utilize open source tools and augment them with our own components to address the operational and security needs of our business. We operate with an open mind, we challenge the status quo and ourselves, to provide the best solutions possible.

In the Data SRE team, we are responsible for the big picture, we operate with a mindset and overall engineering approach of best practices to run better production systems. we build and own our solutions and we are focused on minimizing operations problems. We use a breadth of tools and methodologies to solve a broad spectrum of problems. Practices such as limiting time spent on operational work, blameless postmortems and proactive identification of potential outages factor into iterative improvement that is key to both product quality and interesting and dynamic day-to-day work. We leave our egos at the door to ensure a collaborative and creative environment

The Role

As a member of our SRE team, you will work with product managers and engineers in our team, as well as in partner teams. You will influence, educate, drive, build and deliver high quality, highly reliable production systems. You will contribute to product architecture and design by providing architectural and operational input and guidance. You will identify best operational practices; identify, select, and implement best of class operational solutions. You will collaborate with other teams to lead by example with innovation and best practices.


* Participate in the product lifecycle from architecture and design, review product architecture, provide operational guidance and requirements, support in development and own production
* Collaborate on improving engineering team solutions to build well-operational products
* Work closely with development teams to ensure the scale of the products, can be monitored and managed, and provide required metrics and logs
* Evaluate products and technologies required to build our systems, select the best tool for the job, and educate the team about how it is used
* Be automation focused - develop code and scripts to augment external technologies to create automated product release pipelines from development to production
* Maintain and support production service by measuring and monitoring availability, latency, and overall system health, plan and manage capacity and growth
* Practice sustainable incident response and blameless postmortems, to constantly improve our product and processes
* Work with partners team to lead by example - socialize solutions and approaches, contribute to the larger organization approach and methodologies
* Willingness to learn, implement and contribute to our DevOps focused ecosystem.

Basic Qualifications

* Bachelor's Degree or military experience
* At least 3 years of experience with Unix or Linux environments
* At least 3 years of Site Reliability Engineering (SRE) experience
* At least 2 years of experience working in an Agile environment
* At least 1 year of experience of developing automation infrastructure and operational solutions

Preferred Qualifications:

* Master's Degree in Computer Science
* 2+ years' experience coding experience in Scala, Java or Python
* 1+ years of experience with infrastructure technologies (e.g. Kubernetes, Docker, Terraform, CloudFormation) and data products (Kafka, Cassandra or other non-SQL databases, Storm, Spark, Flink or similar)
* 3+ years of experience building automated production solutions and supporting production/customer facing products
* 1+ year of using public cloud services, preferably AWS

At this time, Capital One will not sponsor a new applicant for employment authorization for this position.

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.