About
Job Description
Responsibilities include
* Troubleshooting and resolving problems with other systems engineers, database administrators, software developers, and commercial business colleagues. Strong analytical and troubleshooting skills are essential.
* Participate in the design and development of highly available, fault tolerant, and scalable systems to support our mission-critical applications.
* Take responsibility for the success of your projects from conception to implementation. Effectively manage timelines, and identify all project requirements and roadblocks.
* Write puppet manifests to create all systems in a consistent and repeatable manner.
* Monitor system and network health, and address problems proactively.
* Respond to alerts, resolve or escalate issues as necessary, on-call rotation is required
* Create and maintain systems documentation and run-books
Minimum Qualifications
* 8+ Years experience as a Linux Systems Administrator or Engineer
* 3+ Years experience with Puppet, Ansible or other configuration management systems
* Must have existing experience with CentOS, RedHat or Oracle Enterprise Linux
* Experience implementing highly available systems in a 24x7 mission critical environment
* Scripting experience with Bash and Python
* Troubleshoot I/O, and system performance problems
* Solid understanding of TCP/IP, VLANs, and load balancing principles
* Excellent project management skills
* Strong emphasis on system security
Preferred Qualifications
* Experience with Hadoop, HDFS, and Spark
* Experience with messaging brokers such as Kafka and RabbitMQ
* Clustered products, Floating IP's, and multicast
* Experience with container management and orchestration. Kubernetes, CoreOS preferred.
* Service registration concepts and practices with Consul
* Monitoring and Log collection tools such as Nagios, CheckMK, Prometheus, Grafana, Graylog, Logstash, Kibana, and Filebeat.
* OSPF and BGP routing
* Experience with global scaling techniques such as GeoDNS and Anycast