At TGS, at-scale computing is core to what we do, and systems administrators are an essential part of our team. This is a mission-critical role, so we're searching for an extremely reliable professional who is passionate about and has a deep understanding of high performance computing, enjoys automating complex tasks, and has experience working in mission critical production environments.
Must Have
* At least five years of in-depth, hands-on experience with Linux system engineering (Red Hat, CentOS, or Ubuntu) * Ability to independently write and debug system scripts and programs in Python, Bash, or similar scripting languages * Ability to independently troubleshoot and debug Linux HPC cluster hardware and software issues * Experience administering HPC clusters with thousands of nodes * Experience with HPC cluster software, such as: large-scale file systems (Lustre or similar); cluster schedulers (Slurm or similar); disk-less node management (nfsroot) * Relevant Bachelor's degree or equivalent combination of relevant education and experience
Desired
* Experience with Linux-based deployment tools such as Kickstart * Experience with configuration management tools such as Ansible, Salt, Chef, or Puppet * Experience building and troubleshooting hardware from large enterprise vendors * Capability to write and debug C++ and C software
Let your dream job find you.
Sign up to start matching with top companies. It’s fast and free.