Job Directory Sr. HPC (High Performance Computing) System Administrator

Sr. HPC (High Performance Computing) System Administrator
Greenbelt, MD

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About

Job Description

Title: Sr. HPC System Administrator

Location: Greenbelt, MD

ASRC Federal InuTeq provides High Performance Computing services throughout the HPC lifecycle for computational requirements, architecture, acquisition, and operations to federal government customers. Our employees embrace innovation and are committed to a culture of continuous, standards-driven process improvement and assimilation of industry best practices. We are seeking an HPC System Administrator for our NASA NACS High Performance Computing contract.

Summary:

This position is a member of an HPC Support team focusing on storage hardware and software for two supercomputing clusters. You will specialize in both the monitoring and management of storage systems and storage-related network management for a large supercomputer.

Duties and Responsibilities:

* Storage tasks:
* Hardware installation
* Hardware testing and daily maintenance/monitoring, LUN configuration and presentation with various controller OS's, filesystem and cluster management with GPFS)
* Monitor and maintain Discover's storage hardware (spinning disk and NVMe-based) and backend storage network (Fibre Channel)
* Monitor and maintain Discover's GPFS cluster, including all ~3700 clients and 60 NSD servers (plus managers and quorum nodes)
* Monitor and maintain Discover's 3 high-speed interconnect fabrics (2 FDR InfiniBand and 1 Omni-Path OPA100 fabric, including cables, switches, firmware, and software-level such as the SM's)
* Address user tickets and resolve issues in various cluster areas
* Attend meetings with high-priority user groups to keep open channels of communication and address concerns they may have
* Maintain test and development system to keep it consistent with the production cluster
* Consult the customer on new cluster hardware purchases (both storage and compute)
* Assist with benchmarking new products (storage systems and switches) that will potentially be used in production
* Test and verify hardware such as storage and high-speed fabrics to validate it for production

Requirements

* Bachelor's degree in Computer Science, Management Information Systems or other technical discipline plus 10 years of relevant work experience or equivalent
* Experience with HPC parallel filesystems (e.g., GPFS, Lustre)
* Experience with storage systems (data/metadata/IO server configurations in GPFS, spinning disk, SSD, and NVMe)
* Experience with high-speed interconnect networking (e.g., InfiniBand, Omni-Path, Fibre Channel) - cabling, cards, switches, OFED/MOFED, etc.
* Working knowledge of scripting and programming languages such as C, C++, Fortran Bash, CSH, TSCH, Perl, Python, Ruby.
* Good organization skills to balance and prioritize work, and ability to multitask
* Good communication skills to communicate with support personnel, customer, and managers.

US citizenship and the ability to obtain a Public Trust security clearance are mandatory requirements for this position

ASRC Federal and its Subsidiaries are Equal Opportunity / Affirmative Action employers. All qualified applicants will receive consideration for employment without regard to race, gender, color, age, sexual orientation, gender identification, national origin, religion, marital status, ancestry, citizenship, disability, protected veteran status, or any other factor prohibited by applicable law.

AFHCM

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.