Job Directory Site Reliability Engineer

Site Reliability Engineer
Boston, MA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.


Job Description

Position Summary

Reporting to the Team Lead, Cloud, Global TSG Infrastructure Operations, this position is responsible for maintaining our Cloud environments to ensure stable, secure, and scalable infrastructure for our customers. As part of the broader Global TSG Infrastructure Operations team, this person will work with teams inside and outside of Bain to provide world-class services in a friendly and supportive manner. They will use development and scripting tools to build, configure, test, deploy and operate cloud resources for groups internal to Bain and for third-party external teams, and will be proactive in monitoring systems, troubleshooting problems, and responding to queries and requests. The fast pace of change in Cloud technologies requires this person to be innately curious, quick to learn, devoted to best practices, and results-driven.

Responsibilities & Duties

* Focus on availability, performance, stability, resilienacy and monitoring of cloud infrastructure for customer-facing applications and services
* Proactively identify issues with stability, availability, and security and bring new ideas to the table to improve overall delivery of services
* Make security a priority at every step of the infrastructure, application and product lifecycle
* Build and maintain tools to assist with availability, orchestration, configuration, maintainance and monitoring of cloud environments
* Collaborate with architects, engineers, software teams and stakeholders to determine requirements and be involved in the application lifecycle
* Drive clear, fast, open and accurate communications up, down, and across the organization to align project status and expectations
* Help to establish, implement and adhere to the corporate technology standards and guidelines
* Perform on-call duties, system implementations and upgrades that may occur after normal business hours and on weekends as required to minimize business impact
* Other duties as assigned or as responsibilities dictate



* Bachelor's degree with demonstrated interest in technology, technology issues and analytical analysis.
* Vendor Certifications a plus: AWS, Azure, GCP

Desired Requirements:

* 3-5 years of experience in technical environments as a Site Reliability Engineer or a similar role operating high performing datacenter or cloud infrastructure
* Proficient in scripting, infrastructure as code and automation languages and tools such as Ansible, Terraform, AWS CloudFormation, Packer, Powershell and Python
* Experience with monitoring solutions such as New Relic, Datadog or similar
* Experience with creating redundancy, fail-over, backup and disaster recovery plans and tools for highly available applications
* Experience with alert and escalation tools and processes to support proactive monitoring
* Experience with public cloud providers such as AWS, Azure or GCP
* Understanding of Agile & DevOps processes
* Understanding of CI/CD processes and tools such as Jenkins, Travis CI, Azure Pipelines or similar
* Experience with containers, Docker and Kubernetes a plus
* Track record of applying automation to all aspects of technical operations
* Excellent analytical, conceptual, and problem-solving abilities
* Ability to communicate at all levels of the entire organization and with customers
* Excellent written communication skills
* Attention to detail and priority/time management
* High performance and standards as demonstrated by academic or previous job experience


Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.