As a not-for-profit organization, Partners HealthCare is committed to supporting patient care, research, teaching, and service to the community by leading innovation across our system. Founded by Brigham and Women's Hospital and Massachusetts General Hospital, Partners HealthCare supports a complete continuum of care including community and specialty hospitals, a managed care organization, a physician network, community health centers, home care and other health-related entities. Several of our hospitals are teaching affiliates of Harvard Medical School, and our system is a national leader in biomedical research.
We're focused on a people-first culture for our system's patients and our professional family. That's why we provide our employees with more ways to achieve their potential. Partners HealthCare is committed to aligning our employees' personal aspirations with projects that match their capabilities and creating a culture that empowers our managers to become trusted mentors. We support each member of our team to own their personal development-and we recognize success at every step.
Our employees use the Partners HealthCare values to govern decisions, actions and behaviors. These values guide how we get our work done: Patients, Affordability, Accountability & Service Commitment, Decisiveness, Innovation & Thoughtful Risk; and how we treat each other: Diversity & Inclusion, Integrity & Respect, Learning, Continuous Improvement & Personal Growth, Teamwork & Collaboration.
General Summary/Overview Statement:
Partners Healthcare Systems' (PHS) Enterprise Research Infrastructure and Services (ERIS http://rc.partners.org) is immediately seeking a Senior Linux Systems Automation Engineer with extensive experience in Puppet. This position works within a multi-disciplinary Scientific Computing and Big-Data team that architects, builds, maintains and supports the scientific computing and analytics systems IDEA (Integrated Data Environment for Analytics) and HPC Platforms for the research mission.
The role is responsible for the build and deployment of our Linux High-Performance Computing (HPC) ecosystem and includes automating the systems from every aspect of Linux, the hardware and its network, deployment, patching, and technical troubleshooting. The role is challenging and varied, requiring technical, interpersonal and problem-solving abilities.
Using your knowledge, you will develop and maintain the infrastructure needed to support high-throughput data analyses and further the research that includes translational and clinical studies needed to transition findings from the bench to the bedside. In the course of expanding and improving the service capability, you will interface with commercial hardware and software vendors, select and deploy new technologies and create how-to guides for end users.
We are seeking someone with eagerness to learn and apply technology, versatility, and breadth in Linux technical skills, and customer service skills. Ideal candidates must be able to demonstrate an interest in the fields of medical informatics, health sciences, and research and in working customers who are world-renowned leaders in their scientific field and the academic environment.
Principal Duties and Responsibilities:
* Cluster and Systems Administration: Manage and administer production systems used by researchers and Research Centers across the Academic Medical Centers of Partners.
* Puppet Automation - Puppet code refactoring to deploy and maintain systems and applications. Evaluate and adapt Puppet Forge modules or write new classes and modules as necessary.
* Analyzes result of server monitoring and implement changes to improve performance, processing and utilization. Proposes, maintains and enforces polices, practices and security procedures.
* Analyze and resolve customer and technical problems: Tuning cluster scheduling parameters, memory / CPU contention, scientific application compilation and run-time issues. Troubleshoot scheduler submission problems.
* Configure job scheduling parameters for equitable resource sharing and optimum throughput.
* Develop, publish and maintain knowledgebase articles and documentation on systems features, best practices and usage how-to's as well as training and reference materials for the community using the ERIS wiki and knowledge management tools.
* Evaluate, select and deploy hardware and cloud solutions for research scientific computing. This includes CPU and GPU-based compute, high speed networking and data storage.
* Field work within the corporate datacenter.
* Responsible for the inventory and tracking of HPC computer related equipment.
* Perform other duties as assigned or required by the situation and circumstances.
* BA/BS/engineering degree required or equivalent combination of skills/experience. Advanced degree in engineering or related scientific discipline preferred.
* 6 years of experience in managing/administering Linux server environments (CentOS/Red Hat are preferred).
* 3 years of experience with automation and configuration management using Puppet is an absolute must-have (other automation tools like Ansible is a plus).
* RHEL certifications a plus.
* Puppet Professional Certification is a plus.
* A combination of education and experience may be substituted for requirements.
* Demonstrated ability in providing systems administration of up to several hundred Linux servers in an on-premise environment.
* Hands-on experience writing, maintaining, refactoring and debugging Puppet code and using Hiera to retrieve class parameters.
* Strong skills writing Linux shell scripts (Bash or Python).
* Experience with monitoring software such as Nagios or Ganglia.
* Understanding of DHCP, DNS, TCP/IP, NFS, SMB and HTTP network protocols.
* Strong verbal and written communication, ability to write clear technical documentation.
* High level of initiative and eagerness to learn new technologies.
* Familiarity with information technology security and data privacy considerations applicable to a healthcare environment is advantageous.
* Highly desired:
* Knowledge of HPC job scheduling platforms like LSF, GridEngine or Slurm.
* Experience with server deployment technologies (kickstart, PXE, IPMI).
* Experience Kerberos authentication.
* Experience providing support to research investigators with diverse computing needs.
* Standard office environment with travel to Hospital locations in the Boston Metro area including the data centers
* As projects and priorities dictate, flexible work and off-hours are required including evening, night and weekend hours to cover events, roll-outs and special projects
* Occasionally lift and carry supplies and equipment weighing up to 25 pounds.
About Partners HealthCare
Partners HealthCare is a not-for-profit health care system that is committed to patient care, research, teaching, and service.