Job Directory Staff Database Reliability Engineer

Staff Database Reliability Engineer
San Francisco, CA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.


Job Description

Staff Database Reliability EngineerAbout the Team

Slack's Database Reliability Engineering team builds and operates the platform services that store data at Slack. We write software to manage thousands of stateful hosts, providing many petabytes of online database capacity. We are also in the midst of transitioning Slack's core MySQL infrastructure to use Vitess' flexible sharding and management capabilities. Review our recent presentation slides: Migrating to Vitess at (Slack) Scale.

Slack has a positive, diverse, and supportive culture-we look for people who are curious, inventive, and work to be a little better every single day. In our work together we aim to be smart, humble, hardworking and, above all, collaborative. If this sounds like a good fit for you, why not say hello?

About the Role What you will be doing

* Be responsible for projects or efforts with strategic value that brings clarity to areas of deeper complexity, and empowers a team to do its best work within cross functional organizations; that can affect multiple streams of work
* Developing and leading larger projects, from start to finish, where scope is mostly understood
* Designing and developing new highly-available infrastructure to meet the needs of our growing and evolving product
* Writing software to make the database infrastructure self-managing and self-service
* Advising feature teams on how we can support the database needs of new features under development
* Writing code to capture data about service performance, and create tools and dashboards to provide insight into that data
* Participating in the Database Reliability Engineering on-call rotation, triaging and addressing production issues as they arise
* Contributing to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems

What you should have

* You have curiosity about how things work
* You've been developing and operating high-traffic Internet applications and can point to things you've worked on
* You've deployed server software on Linux, and then operated it at scale. You've debugged its problems, and analyzed and optimized its performance
* You are a strong communicator. Explaining complex technical concepts to designers, support, and other engineers is no problem for you
* You enjoy helping onboard new team members, mentoring, and teaching others



* Professional experience operating at least one distributed data storage system, at scale and in a team environment. Some examples include: a relational database like MySQL, a search engine like Solr, or a streaming message bus like Kafka
* Bachelor's degree in Computer Science, Engineering or related field, or equivalent training, fellowship, or work experience


* Solid competency in software engineering, using functional or imperative programming languages -- e.g. PHP, Python, Ruby, Go, C, or Java (used without frameworks)
* Experience using distributed storage systems scaled out across hundreds or thousands of servers

Bonus Points:

* Experience expressing complex questions in SQL, especially MySQL
* Experience using deployment automation/configuration management (Chef a plus)
* Experience with virtualized environments, especially Amazon Web Services
* Experience in a startup environment

Slack is an Equal Opportunity Employer and participant in the U.S. Federal E-Verify program. Women, minorities, individuals with disabilities and protected veterans are encouraged to

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.