Skytap is looking for a talented Senior Systems Engineer to join our Infrastructure team. This team is the foundation for Skytap's success in part by building and maintaining our own internal IaaS. Our technically skilled engineers exhibit passion and expertise in building and maintaining Skytap's cloud services.
As a member of the team, you will be working with others to select, standardize, procure, deploy, validate, and support all storage, x86, and IBM Power servers across the company; maintain all datacenter and co-location spaces; manage all hardware and datacenter vendor relationships; and own all hardware asset management and inventory processes.
* Collaborate with internal service teams to develop infrastructure solutions that facilitate new customer features and enable our production services to scale
* Order, procure, build, and perform quality assurance for hardware and storage systems, including configuration of out-of-band interfaces (iLO/iDRAC/BMC), BIOS settings, and firmware levels
* Maintain and enhance storage solutions based on HPE 3PAR, fibre channel, iSCSI, and ZFS
* Troubleshoot and repair hardware defects on x86 and IBM Power servers and hardware components (CPU, RAM, RAID controllers, storage devices, network interfaces, power supplies, etc.)
* Develop automation using Ansible and Puppet to standardize deployment and maintenance processes and reduce time to diagnose and repair defects
* Participate in the administration and maintenance of Skytap global infrastructure health, including responding to alerts, diagnosing and recovering from service-impacting incidents, and continuously improving monitoring
* Collaborate with the Infrastructure Networking team on WAN and LAN connectivity, routers, switches, and security
* Develop tools and software components using Python and shell to enable streamlined hardware and datacenter management
* Document processes, procedures, and policies for all infrastructure solutions and components
* Communicate with internal customers as required, including requirements gathering, notification of systemic changes, and participating in incident management and change management processes
* Anticipate some travel to data centers
* Participate in an on-call rotation providing 24x7 coverage for Infrastructure incidents
* 5+ years experience designing, building, and maintaining resilient x86 and IBM Power hardware solutions
* 5+ years technical experience and advanced understanding of Unix/Linux operating systems (Ubuntu preferred)
* 5+ years scripting experience in Bash/Shell
* 3+ years development experience in Python (preferred), Ruby, Go, or Perl
* 3+ years experience with configuration management and orchestration using Ansible and Puppet
* 3+ years experience working with Virtualization/Cloud technologies, such as VMWare (preferred), OpenStack, AWS, Azure
* Expertise in SAN storage systems, preferably HPE 3PAR and IBM Storwize v9000
* Experience leading project/functional teams to comprehend complex technical problems and deliver suitable solutions
* Deep understanding of hardware and data center management practices, including space management, power management, racking and cabling standards, and asset inventory/lifecycle processes
* Knowledge and experience with monitoring, logging, and system management tools such as Zabbix, InfluxDB, Kapacitor, Splunk, and ELK
* Ability to troubleshoot network configuration and connectivity, including Software Defined Networks
* Practical knowledge of source code control and CI/CD tools, including Git and Jenkins
* Knowledge of security concepts and technologies, including authentication, authorization, and accounting (AAA), encryption, and server hardening
* Working knowledge of Jira and Confluence
* Ability to multi-task and adapt to changes quickly
* Experience running and maintaining a 24x7 Internet-oriented production environment across multiple data centers, involving at least thousands of servers
* Demonstrable expertise around specifying, designing, and/or implementing system health, performance monitoring tools, and software management tools for 24x7 environments
Why Join Us?
Since our founding in Seattle 13 years ago, our employees have been organically creating a culture defined by their passion, technical excellence, and desire to succeed as a team. These values have helped us navigate more than a decade of disruption in the tech industry and we're now poised for major growth.
Staying true to who we are as we grow and change means hiring people who share our commitment to customer success and company-wide accountability. We empower each employee to make an impact, to be active ambassadors of diversity and inclusivity, and to play an active role in building a company worthy of our employees' commitment.
We are scaling rapidly and gaining significant market traction. We have raised over $100M in funding, having closed our Series E round in August 2017. Our public cloud service was designed to solve a specific challenge -- migrate and modernize traditional enterprise applications in the cloud -- that is now paramount to CIOs worldwide. Major enterprises in healthcare, retail, media and more depend on Skytap Cloud to compete in the digital economy. We have differentiated, protected technology that decidedly solves an urgent challenge for enterprise IT. And we need more talented people to solve ever-larger customer challenges.
At Skytap we believe that great people build a great company. Skytap is committed to providing an inclusive, positive working environment for all employees, regardless of sexual orientation, ableness, physical appearance, education, age, race, or religion. We're looking for amazing people to join our mission--come be a part of building something great!
Skytap offers public service for cloud computing.