Job Directory Director of Incident and Escalation Management

Director of Incident and Escalation Management
Plano, TX

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.


Job Description

Join us as we pursue our disruptive new vision to make machine data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we're committed to our work, customers, having fun and most importantly to each other's success. Learn more about Splunk careers and how you can become a part of our journey!


As the Director of Incident and Escalation Management you will be responsible for developing a premier incident and escalation management process and team. In this role you will lead and build a team, which advocates the customer's business impact and liaises across engineering, operations and product groups across the company. We are looking for someone who will challenge the status quo; transform process, tools and ways of thinking; and drive operational excellence throughout the business.

This role requires an understanding of industry accepted incident management processes such as ITIL and phases of organizational readiness, experience with various methodologies and tools, the ability to look across the enterprise and integrate key information from various problem management initiatives, and a keen ability to rally partners and actively problem-solve.


Strategy, Scale and Process

* Lead the major incident and escalation process to drive the restoration of services quickly for customers while minimizing impact.
* Design, build and execute a set of critical issues processes / mechanisms globally, to ensure customer experience, measured through time to stable & time to resolve.
* Develop a 3-year vision for that continually reduces the number and severity of issues.
* Build processes and capacity/investment strategy that allow the team to scale to 5x growth.
* Develop a closed loop system for eliminating defects that caused the escalation at the source to prevent future issues. A key measure will be the elimination of repeat customer issues of the same type.
* Ensure quality team outcomes such as customer-facing Root Cause Analysis (RCA) documents, senior executive readouts, and After Action Reports (AAR's). Drive root cause analysis and corrective action completion to help eliminate disruption of services and consequently improve the day-to-day operations of the organization using validated problem analysis methodology and tracking all elements of the RCA to closure.
* Act as an incident management forward-thinking leader within the company and remain up-to-date on industry leading principles and strategies.

Leadership, customer focus, influence and communication

* Lead and grow a diverse team of Incident Managers responsible for responding to, investigating, managing, and resolving high-impact incidents 24x7x365.
* Coordinate efforts across multiple teams in order to ensure an effective incident response capability.
* Create effective reporting for multiple audiences, delivered with an appropriate level of detail on a timely basis.
* Develop processes, partnerships, and resource plans with internal teams (e.g. Development) to ensure immediate action is being executed on raised issues.
* Ensure global teams act as one, including seamless handoffs, and coverage models for holidays and weekends.
* Forge tight partnerships (and processes) with internal leaders (and their respective teams) to ensure delivery of a flawless process with customers.

Insight and Action

* Identify new insights from incident and problem data to help focus future efforts on product and service stability.
* Deliver consistent, industry leading metrics for incidents/escalations across response time, resolution time, customer satisfaction, and process or technology fixes from root causes.
* Design and deploy a mechanism for tracking and reporting status of escalations that can be shared with customers and internal groups, while automating process and alerting.


* 10+ years of experience using data to drive decision-making and improving operations, communicating to Executives, both internal and external, and leading / managing teams.
* 10+ years experience supporting and troubleshooting commercial end user software applications, preferably with enterprise level applications.
* 3+ years supporting Enterprise Software and/or Cloud based technologies
* You will be Customer Focused. Demonstrated results in delivering consistent results in terms of responsiveness, resolution, and CSAT at scale.
* Excellent Communicator. Excellent verbal and written skills (specifically in the documentation and presentation of findings), as the role requires heavy interaction with customers, Splunk leadership (at all levels), external vendors, and other strategic partners. Clearly explain highly technical issues to a non-technical audience.
* Tenured experience leading global incident management teams.
* Demonstrated strategic and tactical thinking, quantitative and analytical skills, in difficult situations.
* Ability to influence and persuade internal partners without formal authority.
* Demonstrated ability to quickly adapt in dynamic environments.
* Possess a bias for action - willing to move rapidly and decisively to resolve customer issues.
* Ability to work global hours including weekends and holidays as needed.
* B.S in Computer Science or other equivalent technology required.

Preferred Qualifications:

* A consistent track record of successfully delivering initiatives from conception through completion.
* Expertise in incident management industry standard methodologies, including ITIL.
* 5+ years in crisis management preferred.
* Experience working across geographies and functional teams (e.g. Engineering, Product Management, Operations, Customer Success, Global Support).
* Experience and hands-on skills with operating systems like UNIX, Linux or Windows.
* Prior experience as a Technical Support or Service Engineer.
* 5+ years of experience as an information security architect.
* Splunk product experience.
* You will have worked with tools such as JIRA, SFDC, ServiceNow, Slack, Incident orchestration and automation tools (i.e. VictorOps, etc).

We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

For job positions in San Francisco, CA, and other locations where required, we will consider for employment qualified applicants with arrest and conviction records.

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.