Job Directory HPC Systems Engineer

HPC Systems Engineer
Chicago, IL

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About

Job Description

Please make sure to read the job posting in its entirety as it reflects both the University roles and responsibilities, followed by the specific description.

Department86755 Research Computing Center

About the UnitThe University of Chicago Research Computing Center (RCC), a unit in the Office of Research and National Laboratories (RNL), provides high-end research computing resources to researchers at the University of Chicago. It is dedicated to enabling research by providing access to centrally managed High Performance Computing (HPC), storage, and visualization resources. These resources include hardware, software, high-level scientific and technical user support, and the education and training required to help researchers make full use of modern HPC technology and local and national supercomputing resources. The Office of Research and National Laboratories oversees the conduct of sponsored research, research program development, multi-institutional research institutes, national laboratory board, and contract management functions. RNL supports the development and coordination of research-related communications and educational programs at The University of Chicago. RNL oversees the management of two Department of Energy contracts for Argonne National Laboratory and Fermi National Accelerator Laboratory. When combined with the Lab R&D budgets, the office oversees approximately $1.4 billion in sponsored research. RNL works closely with individual scholars, departments, and divisions to encourage, seed, and coalesce research across the University, Argonne, and Fermilab campuses.

Job FamilyInformation Technology

Responsible for the design, implementation, and maintenance of new and existing applications, systems architecture, and network infrastructure. Ensures operation and security of all servers and networks. Configures, installs, maintains and upgrades applications and hardware for the organization's infrastructure and for end-user devices.

Career Track and Job LevelSystems Administration

Provides hands-on maintenance for production servers. Installs, configures, and maintains operating systems and utility software, using best practices and professional standards. Recommends, installs, and maintains appropriate levels of hardware and software firewalls associated with specific server implementations. Designs, scales, and builds automated approaches across all stages of infrastructure development. Continually improves automation of services, monitoring, alerting services, and resiliency. Provides tooling to all of development, operations and infrastructure, patch and maintain Windows and Linux servers. Actively participates in the design and development of products and projects.

P3: Requires in-depth knowledge and experience. Uses best practices and knowledge of internal or external University issues to improve products or services. Solves complex problems; takes a new perspective using existing solutions. Works independently, receives minimal guidance. Acts as a resource for colleagues with less experience.

Role ImpactIndividual Contributor

ResponsibilitiesThe job designs automated, scalable, and rapidly deployable solutions to infrastructure development and server configuration. Works independently to install, configure, and maintain operating systems. Uses best practices and systems knowledge to monitor and alert systems, utility software, and firewalls. Guides maintenance for production servers as well as Windows and Linux servers.

1) Configures, installs, upgrades, and maintains server applications and hardware. Works to safeguard the integrity of computer software. Implements operating system enhancements to improve the reliability and performance of the system., 2) Administers operating systems, maintains security, and implements backup procedures for the organization's information systems and peripheral equipment, such as servers, desktops, printers, and storage devices., 3) Plans and installs necessary patches and upgrades for servers and their associated storage, network, communications, and peripheral sub-systems. Installs and maintains an appropriate level of intrusion detection, monitoring, and auditing software as required., 4) Tracks compliance and maintains documentation for hardware, software, and service inventories for management reports., 5) Performs other related work as needed.

Unit-specific Responsibilities:

1) The University of Chicago Research Computing Center (RCC) is seeking a highly qualified HPC system engineer to join its system and operation team that builds and manages RCC HPC systems and facility operations. The individual in this position will be involved in the management and administration of RCC hardware and software.

2) Installing, configuring, and maintaining large computer clusters/servers and software.

3) Day-to-day operations of the systems including systems administration, monitoring and storage performance up to and including network components.

4) Management of the system's network switch, parallel file system and HPC software stack and tools.

5) Configuration of the scheduling and queuing system.

6) Diagnosing and resolving system operational problems quickly and effectively. Coordinating with vendors to resolve hardware and software problems. Assist users with access and other help desk ticket requests or issues.

7) Building and deploying open source software and software from vendors/partners.

8) Providing reliable and efficient backups/restores for all managed systems. Maintaining and monitoring the security of the HPC systems and servers.

9) Documenting system administration procedures for routine and complex tasks. Other duties as assigned.

Unit-preferred Competencies:

1) Ability to work well with faculty and researchers.

2) Ability to identify and gain expertise in appropriate new technologies and/or software tools.

3) Ability to function as part of an interactive team while demonstrating self-initiative to achieve project's goals and Research Computing Center's mission.

4) Strong analytical skills and problem-solving ability.

Education, Experience, and CertificationsMinimum requirements include a college or university degree in related field.

Minimum requirements include knowledge and skills developed through 5-7 years of work experience in a related job discipline.

Preferred Qualifications:

Education:

1) Bachelor's degree in Computer Science or closely related field or at least two years experience in HPC system administration or managing large HPC clusters.

Experience:

1) A minimum of two years of Linux system administration experience in a large distributed computing environment.

Technical Knowledge, Skills or Certifications:

1) Knowledge of Linux.

2) Experience scripting with one or more language such as Python, Shell, Perl.

3) Experience with Linux build automation tools such as puppet, Ansible, GIT, Docker, highly.

4) Experience implementing automation and monitoring using shell scripting and other related tools

5) Experience with installing, configuring, and maintaining job management tools (such as SLURM, Moab, TORQUE, PBS, etc.).

6) Experience with operating system deployment tools (e.g. XCAT, ROCKS).

7) Experience configuring, administering, and supporting network storage subsystems (e.g. IBM, NetAppl DataDirect Network, LSI, etc.).

8) Experience with one or more distributed file systems (GPFS, Lustre, Gluster, etc.).

9) Experience configuring, installing, tuning and maintaining scientific application software.

10) Experience configuring, installing, maintaining and/or using performance monitoring and optimization tools.

11) Experience documenting implementations and system related tasks.

Required Documents:

1) Cover letter

2) Resume

NOTE: When applying, all required documents MUST be uploaded under the Resume/CV of the application

FLSA StatusExempt

Pay FrequencyMonthly

Pay GradeDepends on Qualifications

Scheduled Weekly Hours37.5

Benefits EligibleYes

Drug Test RequiredNo

Health Screen RequiredNo

Motor Vehicle Record Inquiry RequiredNo

Posting Date2019-06-11-07:00

Remove from Posting On or Before2019-12-11-08:00

Posting Statement

The University of Chicago is an Affirmative Action/Equal Opportunity/Disabled/Veterans Employer and does not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity, national or ethnic origin, age, status as an individual with a disability, protected veteran status, genetic information, or other protected classes under the law. For additional information please see the University's Notice of Nondiscrimination.

Staff Job seekers in need of a reasonable accommodation to complete the application process should call 773-702-5800 or submit a request via Applicant Inquiry Form.

The University of Chicago's Annual Security & Fire Safety Report (Report) provides information about University offices and programs that provide safety support, crime and fire statistics, emergency response and communications plans, and other policies and information. The Report can be accessed online at: http://securityreport.uchicago.edu. Paper copies of the Report are available, upon request, from the University of Chicago Police Department, 850 E. 61st Street, Chicago, IL 60637.

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.