Job Directory Site Reliability Engineer

Site Reliability Engineer
San Francisco, CA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About

Job Description

Domino has an ambitious vision for data science and machine learning. Our platform helps data science teams accelerate research, increase collaboration, and rapidly deploy predictive models. Our customers are the most sophisticated analytical organizations in the world, including Monsanto, Allstate, and Instacart. Backed by Sequoia Capital, Zetta Venture Partners, Bloomberg Beta, and In-Q-Tel, we are at the epicenter of the data science revolution, helping companies build better cars, develop more effective medicine, or simply recommend the best song to play next.

You will be joining a team of high-performance engineers and have a significant impact on managing a growing infrastructure and service delivery. You'll be tasked to maintain the health of the Domino platform in a variety of environments, building reliability into our stack, improving our availability, and customizing our DevOps and deployment tool chain.

Responsibilities

* Instrument and monitor service health; ensure availability
* Automate creation and configuration of infrastructure and services
* Incident response (on-call) and root cause analysis
* Enhance infrastructure reliability and efficiency
* Collaborate with app, platform engineers and product managers to continuously improve Domino

Qualifications

* Strong coding ability (Python, Bash, Scala)
* Systems fluency (Linux, storage, networking)
* Observability systems (New Relic, Prometheus, ELK)
* Modern software components (Mongo, Redis, ElasticSearch, RabbitMQ, Play, ???)
* Experience with container management (Docker, Kubernetes)
* Infrastructure and configuration automation (Terraform, SaltStack)
* Exceptional problem solving acumen

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.