Job Directory Senior Site Reliability Engineer

Senior Site Reliability Engineer
San Francisco, CA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About

Job Description

Moovweb is looking for a senior Site Reliability Engineer to join our Platform team! Moovweb provides a high-speed, scalable platform for hosting modern, dynamic e-commerce web sites. Every day, our platform handles a massive amount of traffic for many mid-to-large size retailers such as 1-800-Flowers, Pep Boys, and United Airlines. We are heavy users of Amazon Web Services, Kubernetes (EKS), and Node.js. Have a passion for speed, scalability, security, automation, and reliability? You could be a great fit for Moovweb!

Requirements:

* At least 2 years of experience building, running and scaling production Kubernetes architectures, preferably on Amazon EKS
* At least 2 years of experience working with Amazon Web Services - including S3, EC2, CloudFormation, EKS, Route 53, RDS and more
* At least 2-4 years of experience working with production Node.js web services

The ideal candidate will have...

* Knowledge of the tenets of SRE and best practices related to: security, performance, reliability/durability and disaster recovery.
* Extensive experience working with highly scalable, globally distributed systems
* Strong knowledge of high-performance networking on AWS, Internet protocols, and CDN configuration (preferably Fastly)
* The ability to build and scale large scale platform architecture, with a strong understanding of common scaling pitfalls and potential tradeoffs
* Experience monitoring large environments using tools like Sumo Logic, Datadog, CloudWatch, etc.
* Passion for good, usable documentation, and appreciation for how it allows for a widely distributed team to function
* Experience designing libraries and tooling to facilitate smooth CI/CD pipelines
* Strong understanding of security best practices
* Comfortable working with critical, customer-facing issues and able to prioritize quickly when escalations happen.
* Passion for making things better and faster!

Responsibilities:

* Be on-call at least one week out of every month for services that the SRE team owns; help triage, then coordinate/resolve escalations as they arise.
* Collaborate with other Moovweb engineering teams to support projects before they go live through activities such as: app design consulting, developing software platforms and frameworks, capacity planning and launch reviews.
* Design and develop automated solutions that improve the performance and reliability of the Moovweb Platform.
* Improve logging, metrics, telemetry, and monitoring to help use ensure high availability and reduce mean time to resolution (MTTR)
* Improve the company's SRE processes, and help build a culture around them.

Bonus points:

* Experience with Varnish and the VCL language
* Experience with Ruby and Ruby on Rails
* Experience with React
* Experience with AWS Lambda and Serverless

We offer competitive salary, equity, and other benefits including 401(k), full medical/dental/vision package that includes support for families, unlimited vacation policy, an inviting office in a prime downtown San Francisco location, community volunteering, sports groups, Women@Moovweb, fully-stocked kitchen, Yoga, game night, Movie Night, and fun team outings in the beautiful Bay Area.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.