Princeton University's Plasma Physics Laboratory (PPPL) has an opening in the Information Technology Department for a High Performance Computing Team Lead. Reporting to the Chief Information Officer, the successful candidate will be responsible for maintaining, designing, planning, and implementation of PPPL's research grade high performance computing environments and HPC support staff. This person will also be responsible for working closely with PPPL staff to assure their research computing requirements are being addressed and appropriately prioritized. The ideal candidate must have strong written and verbal communications skills.
Provides leadership for the planning and implementation of high performance computing systems. Cultivate a high level of collaboration by regularly meeting with other team leads and working groups.
Troubleshoots and resolves complex software, operating system, and network problems and determines whether the problem is system-related, hardware, software or the end-user. Relies upon extensive knowledge of server and desktop systems, vendor supplied diagnostic tools and web based information to determine the reason for the malfunction and the appropriate solution to resolve the problem. Must be able to make independent decisions to best resolve the problems.
Develops, tests, implements, installs and maintains the operating system and the related software for proper server system operation. Configures servers using extensive knowledge of various computers, installs drivers, hardware components and various operating systems. Including monitoring server system backups and when necessary, performs data recovery.
Assists in the troubleshooting of escalated end-user system issues to help maintain consistent lab-wide computing.
Troubleshoot and maintain cyber-security issues pertaining to internal and external firewall and system configurations and settings to meet government cyber-security requirements and also provide consistent and secure networking lab-wide. Backup to the Laboratory's Cyber Security Officer, participating in the incident response process and assessment of cyber security requirements and controls, log reviews and forensics and vulnerability scanning and remediation.
Documents server system problems related to hardware, software and setup of prescribed formats, resolving them independently or referring them to the immediate supervisor as needed.
Provide recommendations for non-desktop hardware based on detailed project specifications and changing environment needs. Other duties as assigned.
* Consults with users, vendors and other IT staff to design and specify research computing systems and storage needed * Installs, maintains and administers research computing systems and clusters * Analyzes and troubleshoots system level problems with software, data and job submissions * Enhance communication and productivity by regularly meeting with other team leads and in working groups * Create and maintain documentation for all systems to ensure greater collaboration and understanding of the environment * Assist in troubleshooting user related issues, such as code development and deployment * Provide training where necessary * Utilize monitoring and diagnostic tools for preventative maintenance of enterprise systems * Research and provide new technologies based on changing requirements * Design, deploy and maintain automated configuration management of Linux based systems
* Bachelor of Science in Computer Science or related field * 5+ Years experience in managing High Performance Computing environments * Experience managing technical staff * Knowledge of parallel filesystems (such as Lustre) and high speed interconnects (Infiniband, ethernet fabrics) * Strong knowledge of job scheduling technology, such as SLURM * Strong oral and written communication skills * Strong multitasking skills * Experience with configuration management systems, such as Puppet * Ability to work with and follow guidelines set forth in security benchmarks, such as CIS * Ability to architect technical solutions for specialized software and data * General knowledge of networking equipment and techniques
Princeton University is an Equal Opportunity/Affirmative Action Employer and all qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability status, protected veteran status, or any other characteristic protected by law. EEO IS THE LAW
Salary GradeADM, 08 Standard Weekly Hours40.00 Eligible for OvertimeNo Benefits EligibleYes Essential Services Personnel (see policy for detail)No Physical Capacity Exam RequiredNo Valid Driver's License RequiredNo
Let your dream job find you.
Sign up to start matching with top companies. It’s fast and free.