Senior Monitoring Tool Administrator
Location: Dallas, TX
Duration: 6 months with possible contract to hire
* The Tools Operations team is seeking a monitoring tool administrator to support enterprise wide monitoring tools in the Disaster Recovery (DR) environment.
* The candidate should be skilled and experienced with monitoring tools like Riverbed/OPNET AppInternals and AppResponse, Riverbed Steelcentral Suite, CA UIM - NimSoft and Spectrum, HP SiteScope, and Elasticsearch (ELK).
* These includes but not limited to network and systems monitoring, database monitoring, synthetic transaction monitoring and application performance analysis of Java and .NET based applications.
* The qualifying engineer plans, coordinates and supports client's disaster recovery exercises for validation and troubleshooting of the monitoring tools.
* The engineer will also be a part of the core monitoring team involved in engineering and operations of monitoring, performance and log analytics tools.
* This also includes product upgrades and maintenance, troubleshoot system related problems, and will be responsible for the 24x7 availability of the tools infrastructure as part of an on-call rotation.
* Responsibilities also includes maintaining and developing automation scripts, maintenance and development of the monitoring web application, providing tier 3 troubleshooting and restoration of services, enabling additional monitoring capabilities for the infrastructure and applications based on the project requirements.
* The ideal candidate will be a highly effective communicator verbally and in writing, lead operational initiatives and projects, and act with the highest sense of accountability.
* The incumbent will take work assignments from management but is expected to work independently to define, drive, and execute initiatives.
* Lead all the DR related activities to validate, support and troubleshoot monitoring tools.
* Plan, coordinate and implement the product upgrades, patches and maintenance activities for the monitoring tools (CA - UIM/NIMSOFT, Spectrum, HP SiteScope, OPNET/Riverbed)
* Perform customization, configurations and develop scripts / interfaces as applicable to enhance monitoring capabilities.
* Work with Network, System and Storage administrators for routine operations such as performance tuning, upgrades and backup
* Assist the team in the roadmap, implantation and support of monitoring activities on cloud (AWS).
* Work with application teams to understand the Java/WebLogic framework and architecture of applications and recommend performance monitoring best practices accordingly.
* Develop and maintain BladeLogic jobs for installation of monitoring agents.
* Work towards automating repeatable processes and tasks using programming scripts.
* Be a highly cohesive team member and a change agent while serving as a subject matter expert (SME).
* Maintain all environments and handle all end-to-end aspects of monitoring as a service.
* Engage with projects, drive the deliverables, manage expectations with all stakeholders.
* Analyze application and infrastructure monitoring and performance needs engaging with application owners, design appropriate solutions, and work toward implementation.
* Work closely with server, network, database, and storage administrators for routine operations such as performance tuning, upgrades and backup.
* Plan Disaster Recovery, maintain documentation, and be prepared to conduct periodic testing.
* Setup governance model for application monitoring.
* Maintain Service level agreements with both customer and support organizations.
* Maintain licenses and provides monthly metrics on the tool usage.
* Maintain and document procedures, data profiles, design, and architecture.
* Provide on-call support for troubleshooting critical issues and planned maintenances.
* Lead all change through appropriate release and change management procedures.
* Communicate routinely and effectively to customers, team, inter-team, and management.
* At least 5 to 7 years of experience in application, systems, and network performance monitoring tools in a large enterprise environment with emphasis on high-availability.
* Experience in Disaster Recovery planning, documentation, implementation, and periodic testing.
* Hands on experience with Riverbed, CA Spectrum, CA UIM, Riverbed SteelCentral Suite, Elastic Search (ELK), Sitescope, or similar tools for application, systems, performance & network monitoring.
* Strong System administration skills (Unix/Windows).
* Experience working in ITIL environment and with ITSM tools like ServiceNow.
* Solid understanding of Java/J2EE solutions using WebLogic and Tomcat web servers.
* Familiarity with SQL databases and commands and fundamentals of relational database design.
* Experience working with html/web-based technologies and the monitoring of it via synthetic test tools.
* Knowledge of monitoring docker containers environment with Kubernetes and/or openshift platforms.
* Strong verbal and written communication skills.
* Must be able to work effectively in a team environment.
* Knowledge or experience with monitoring on AWS cloud.
* Experience AWS native services such as Cloudwatch, Cloudtrail, Cloudformation.
* Experience with open source tools like Prometheus, Icinga.
* Experience with Java development for automation scripts and web development.
* Strong scripting experience (batch/shell/Perl/Python).
* Application Development background.
* Technical Degree or the equivalent combination of education, training, and experience.
* 7+ years of progressive experience in a similar environment as described above with at least 3 to 5 years as the subject matter expert.
* Industry certifications are desirable.
As an equal opportunity employer, ICONMA prides itself on creating an employment environment that
supports and encourages the abilities of all persons regardless of race, color, gender, age, sexual
orientation, citizenship, or disability.