Job Directory Senior Manager, Data Center Operations

Senior Manager, Data Center Operations
El Segundo, CA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About

Job Description

Overview

The Senior Manager, Data Center Operations is a critical role at Stamps.com as it focuses on design, implementation, and maintaining the company's IT operations infrastructure. The role provides an opportunity to implement initiatives to ensure continuous improvement of department KPI's, compliance and SLA adherence. Analyze business needs and oversee the deployment lifecycle of Stamps.com software while maintaining 99.99+% uptime. Establishing system and application performance objectives and striving to meet them, in collaboration with the development team, is key to achieving success in this role. Understand user and process requirements and ensure those requirements can be achieved through high quality deliverables. This role reports to the Vice President, Information Technology.

The Data Center Operations (DCO) team operates under the larger IT department. Data Center Operations is responsible for the uptime of Stamps.com's online services. The department is responsible for implementing and maintaining hardware and software solutions required to provide 24x7 highly available web services.

Company Perks:Competitive pay, 401k with company match, Medical, Dental and Vision Insurance, Employee Stock Purchase Plan, Educational Reimbursement, Commuter benefits, Discount programs, Inventor patent bonuses

Stamps Company Overview:

Stamps.com (NASDAQ: STMP) is the leading provider of postage online and shipping software solutions to nearly 750 thousand customers, including consumers, small businesses, e-commerce shippers, enterprises, and high volume shippers. Stamps.com offers solutions that help businesses run their shipping operations more smoothly and function more successfully under the brand names Stamps.com, Endicia®, ShipStation®, ShippingEasy®, ShipWorks®, and MetaPack®. Stamps.com's family of brands provides seamless access to mailing and shipping services through integrations with more than 475 unique partner applications.

Responsibilities

Primary Objectives:

Lead a team of level I and II Data Center Operators composed of 3 shifts covering 24x7x365 Operations. Direct day-to-day and long-term infrastructure operations related to servers, storage, databases, data center infrastructure, telephony, network, and other technology. Conduct operational activities in a manner that is process driven and documented including relevant change management practices. Translate company culture, values, and strategies into practical application for safety and security. Clearly articulate principles and guidelines, as well as a continuously-evolving collection of best practices. Provide leadership for team in a manner that provides coaching and develops his/her staff in a manner that achieves career goals.

Essential Position Duties (typical monthly, weekly, daily tasks which support the primary objectives):

Lead a team of level I and II Data Center Operators composed of 3 shifts covering 24 x 7 x 365 Operations.

* Ability to communicate honestly and directly while having the courage to own their convictions and adhere to strongly held principles and values.
* Continuous improve department KPI's, compliance and policy adherence while maintaining 99.99+% uptime. The team is responsible for day to day software deploys to the Stamps.com production infrastructure and the team needs to achieve a low error rate to improve deploy efficiency.
* Provide and approve work breakdown schedules and oversee the tasks that are outlined in the project plan.
* Act as primary escalation point for data center operations issues arising internally or externally.
* Independently research the technologies pertaining to the task as well as have the ability to draft a test plan.

Direct day-to-day and long-term infrastructure operations related to servers, storage, databases, data center infrastructure, telephony, network, and other technology.

* Participate as a member of the IT Infrastructure & Operations management team that is responsible for defining the governance processes of the organization's infrastructure architecture, data centers, networks, telecommunications and infrastructure applications.
* Continually evaluate the current state of the Stamps.com data center application, architecture and services against defined goals and objectives. Adapt operational strategies to maintain a highly available and low latency service.
* Lead the team to deliver consistent functionality and availability of our distributed Splunk environment, including standard operational processes, troubleshooting, and execution of specific project objectives. Help share knowledge, and assist in creation and management of Splunk dashboards, alerts, reports and other knowledge objects.
* Educate the DCO staff on technical and administrative changes service offerings and provide guidance and direction for advanced troubleshooting techniques.
* Contribute to the design and execution of the enterprise-wide business continuity and disaster recovery design by defining requirements for new technology implementations and communicating the cost benefits to stakeholders.

Conduct operational activities in a manner that is process driven and documented including relevant change management practices.

* Establish, implement and monitor standard processes, SLA's and metrics for the support of Stamps.com software applications and services.
* Ensure the team follows best practices around change management while performing duties related but not limited to technological installations, upgrades, patching and migrations.
* Implement and coordinate the creation of technical service procedures, policies, and act as final escalation point for technical issues pursued within the Data Center Operations team.

Translate company culture, values, and strategies into practical application for safety and security. Clearly articulate principles and guidelines, as well as a continuously-evolving collection of best practices. Provide leadership for team in a manner that provides coaching and develops his/her staff in a manner that achieves career goals. Qualifications

Education and/or Experience:

* Bachelors's Degree in computer science, engineering or related discipline or equivalent practical experience.
* Four or more (4+) years of management experience.
* Six or more (6+) years of hands-on experience in design and development experience with a production data center infrastructure environment including with a thorough understanding of data center core services (Power, HVAC, security, etc.)
* Four or more (4+) years with either server or network administration.

Data Center Proficiency:

* Data center core services - generators, ATS, UPSs, PDUs, fire dampening systems, CRAC/CRAHs, TCP/IP, L2/3 technologies, application isolation, DNS, packet analysis, Microsoft SQL, SSO, SAML, Splunk Apps & Add-Ons, web application firewalls (WAF), PKIs, LAN technologies, DDOS prevention, Imperva, AWS, web proxy, reverse proxy, load balancing

Skills and Knowledge:

* Ensure operational efficiencies and best practices toward maintaining availability / uptime of all 24X7 mission critical infrastructures.
* Standardize and document all mission critical emergency and operational procedures and maintain as-built documentation.
* Share technical knowledge and operational experience with team members to enhance professional growth and promote cross training of staff.
* Act as technical liaison and subject matter expert between development team and senior management.
* Detailed understanding of Change Management principles
* Basic understanding of standard facilities maintenance affecting the critical environment
* Perform regular inventory including add/move updates and regular audits
* Proactively seeking, identifying, exploring and resolving technology gaps in the production networks, systems and services.
* Troubleshooting general networking issues
* Triaging system and application emergencies including downtime reporting and root cause analysis. Ability to effectively prioritize and drive root cause analysis (RCA) and issue resolution.
* Developing and managing hardware failure and acceptable recovery plan for all supported locations
* Strong analytical, problem solving, organizational, and interpersonal skills with the ability to adapt to changes and new ideas.
* Excellent verbal and written communication skills with the ability to communicate well with different types of audiences.
* Ability to work in a fast-paced environment with changing priorities and deadlines.
* Ability to demonstrate urgency when there are customer impacting issues.

Travel Requirements:

* Less than 5%

EOE/M/F/Vet/Disability

#LI-KW1

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.