Are you passionate about resilient software and services? Do you know how to build and deploy software that is a joy to operate?
As a site reliability engineer for on the Content Delivery team you will design resiliency into our Origin product by contributing to the product code, design and operation. This includes making improvements to the product code, automated test site, build pipe-line, deployment tooling and operational logs and dashboards. We are a small team that values learning and purpose, and strives for a balance between independent work and team collaboration in an Agile environment.
Who does the site reliability engineer work with?
Our engineers work with several teams to maintain a customer facing, production service, that is resilient and scalable. You will work directly with peer software engineers on the Content Delivery team. You will also work frequently with engineers and architects on the Professional Services, Infrastructure and Operations teams.
What are some interesting problems you'll be working on?
We are actively migrating from a virtual machine deployment paradigm to one that is cloud centric. We are leveraging Docker, Kubernetes, Helm and AWS services to build a software service that is elastic, resilient and global. You will participate in the design, implementation, validation, deployment and operation of this system.
Where can you make an impact?
You will be working on the software service that enables the delivery of video entertainment to millions of customers every day. You will collaborate with fellow engineers to build highly scalable, highly resilient software services that are deployed throughout the country, and soon expanding to international markets.
* Maintain and improve the operation of business-critical software systems that must be online 24/7/365
* Define, measure and report on SLIs, SLOs and Error budgets.
* Write and test, production-ready code
* Write appropriate documentation
* Participate in design and architecture review sessions
* Build effective push-button deployment and monitoring systems
* Perform code reviews
* Participate in software release and deployment activities
* Collaborate constructively with team members
Relevant Technologies and Skills:
* Docker, Kubernetes, Helm, AWS EC2, S3 and CloudFront, Apache, NGiNX, CDN, Video Origin, IP Video, Layer 7 networking, Linux/Unix, Node.js, Python, Bash/SH, Graphite, Prometheus, Grafana, Splunk, Chaos Monkey, Ansible, Jenkins, jmeter, Postman, git, GitHub.
* Interested in contributing to all aspects of the development lifecycle: design, development, testing, integration, deployment.
* Knowledge of microservice architecture, container-based deployment, infrastructure as code, and how to build elastically scaling services.
* Proficiency with Agile SDLC.
* 4+ years of relevant experience including site reliability engineering, quality engineering or software engineering.
* Proficient practitioner of continuous deployment of microservices into production without requiring service downtime.
* Passionate about automated unit and integration testing.
* Strong social and interpersonal skills.
* Preference for an agile, collaborative engineering environment.
* We are interested in a diverse set of candidates for this position. If you are not sure if you qualify, please err on the side of applying.
Comcast Technology Solutions (CTS) is a division of Comcast's Technology, Product and Experience (TPX) organization. We strive to build best-in-class services for Comcast and other video delivery businesses.
Comcast is an EOE/Veterans/Disabled/LGBT employer