Job Directory Sr Site Reliability Engineer

Sr Site Reliability Engineer
New York, NY

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About

Job Description

Job Description

Developer Productivity Engineering is a distributed team that owns internal tools used to deploy the services that make up Disney Streaming Service's products. Built for AWS with a variety of open source software, our tools are used by dozens of engineering teams across the company. We strive to act as a productivity multiplier by offering our customers rich primitives for delivering their services, allowing them to focus more on product.

System Reliability Engineers fulfill a cross-functional role by driving the delivery of services through to production. Within Developer Productivity, you will help design and operate services to support exponential growth in Disney+ and ESPN+. You'll also collaborate with other engineers to pave way for the future of infrastructure in AWS, moving beyond traditional practices. You should have a passion for systems engineering, monitoring & observability, and automation.

This position can be worked remotely, or from our locations in NYC and SF.

Responsibilities

* Maintain, and improve, the reliability and operability of services
* Design systems to enable rapid development, high availability, and clear observability
* Write tools, and leverage open source, to automate tasks with an emphasis on safety and repeatability
* Troubleshoot and resolve performance and reliability issues across the stack, including cloud resources
* Collaborate with engineers to ensure services are designed to be cloud-native, scalable, and easily operated

Requirements

* BS or MS degree in Computer Science, or equivalent experience
* 3+ years experience writing software on, or operating, *nix platforms
* You're a self-learner, independent, and have excellent problem-solving skills
* You care deeply about code craftsmanship and operational excellence
* You have strong written and verbal communication skills

Nice to have, but not required

* Experience with software containers (e.g. Docker, rkt, runC) and schedulers (e.g. ECS, Kubernetes, Nomad)
* You've directly impacted the reliability and availability of large-scale distributed systems
* Deep understanding of networking, especially routing and the IP stack
* You've deployed and operated geographically distributed, redundant services
* Engagement with open source communities

Technologies we love

* Languages: Go, Ruby, Bash
* Tools: Ansible, Docker, Git, Graphite, GraphQL, Jenkins, Logstash, Packer, Sensu
* Data stores: DynamoDB, Elasticsearch, PostgreSQL, Redis

Job Type

Full Time

Segment

Direct-to-Consumer and International

Category

Technology

Business

Disney Streaming Services

Postal Code

10011

Job Description

Developer Productivity Engineering is a distributed team that owns internal tools used to deploy the services that make up Disney Streaming Service's products. Built for AWS with a variety of open source software, our tools are used by dozens of engineering teams across the company. We strive to act as a productivity multiplier by offering our customers rich primitives for delivering their services, allowing them to focus more on product.

System Reliability Engineers fulfill a cross-functional role by driving the delivery of services through to production. Within Developer Productivity, you will help design and operate services to support exponential growth in Disney+ and ESPN+. You'll also collaborate with other engineers to pave way for the future of infrastructure in AWS, moving beyond traditional practices. You should have a passion for systems engineering, monitoring & observability, and automation.

This position can be worked remotely, or from our locations in NYC and SF.

Responsibilities

* Maintain, and improve, the reliability and operability of services
* Design systems to enable rapid development, high availability, and clear observability
* Write tools, and leverage open source, to automate tasks with an emphasis on safety and repeatability
* Troubleshoot and resolve performance and reliability issues across the stack, including cloud resources
* Collaborate with engineers to ensure services are designed to be cloud-native, scalable, and easily operated

Requirements

* BS or MS degree in Computer Science, or equivalent experience
* 3+ years experience writing software on, or operating, *nix platforms
* You're a self-learner, independent, and have excellent problem-solving skills
* You care deeply about code craftsmanship and operational excellence
* You have strong written and verbal communication skills

Nice to have, but not required

* Experience with software containers (e.g. Docker, rkt, runC) and schedulers (e.g. ECS, Kubernetes, Nomad)
* You've directly impacted the reliability and availability of large-scale distributed systems
* Deep understanding of networking, especially routing and the IP stack
* You've deployed and operated geographically distributed, redundant services
* Engagement with open source communities

Technologies we love

* Languages: Go, Ruby, Bash
* Tools: Ansible, Docker, Git, Graphite, GraphQL, Jenkins, Logstash, Packer, Sensu
* Data stores: DynamoDB, Elasticsearch, PostgreSQL, Redis

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.