Job Directory Oracle Director, Service Reliability Engineering
Oracle

Director, Service Reliability Engineering Oracle
Redwood City, CA

Oracle is a company providing integrated cloud applications and platform services.

Companies like Oracle
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About Oracle

Job Description

Manage a team that designs, develops, troubleshoots and debugs software programs for databases, applications, tools, networks etc.

As a director of the software engineering division, you will apply your extensive knowledge of software architecture to manage software development tasks associated with developing, debugging or designing software applications, operating systems and databases according to provided design specifications. Build enhancements within an existing software architecture and envision future improvements to the architecture.

Assists in the development of short, medium, and long term plans to achieve strategic objectives. Regularly interacts across functional areas with senior management or executives to ensure unit objectives are met. Ability to influence thinking or gain acceptance of others in sensitive situations. Demonstrated leadership and people management skills. Strong communication skills, analytical skills, thorough understanding of product development. BS or MS degree or equivalent experience relevant to functional area. 7 years of software engineering or related experience.

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability and protected veterans status or any other characteristic protected by law.

Oracle, the world leader in Enterprise Cloud, is hiring the best and brightest technologists in the industry as we continue to add customer-centric, world-class, leading edge, secure, hyper-scale based solutions throughout all levels of the cloud stack. Oracle's cloud eco-system is the only complete business cloud platform on the planet, with market leading and business transforming solutions spanning SaaS, DaaS, PaaS and IaaS. Oracle's Cloud applications, such as Enterprise Resource Management, Customer Relationship Management, Human Capital Management, and Supply Chain Management are used by thousands of customers across the globe and are the broadest, most innovative in the industry, providing businesses with adaptive intelligence, standardized business processes and competitive advantage at low cost.

As part of market leading ERP Cloud, Oracle ERP Cloud offers a broad suite of modules and capabilities designed to empower modern finance and deliver customer success with streamlined processes, increased productivity, and improved business decisions.

ERP Cloud Operations is looking for passionate, innovative, high caliber, team oriented super stars that seek being a major part of a transformative revolution in the development of modern business cloud based applications. We are seeking highly capable, best in the world developers, architects and technical leaders at the very top of the industry in terms of skills, capabilities and proven delivery; who seek out and implement imaginative and strategic, yet practical, solutions; people who calmly take measured and necessary risks while putting customers first.

Key Tasks and Responsibilities

* Service Ownership - You will be part of the SRE team, whose mission is the shared full stack ownership of a collection of services, with our Service Development and Operations SRE partners.
* Ownership Scope - You will understand the end-to-end configuration, technical dependencies, and overall behavioral characteristics of the production services you own. In partnership with your Service Development and Operations SRE partners, you will have the responsibility to ensure that services are designed and delivered to be mission critical with focus on monitoring, telemetry, security, resiliency, scale, and performance.
* Service Design - You will partner with the SRE Architect, Service Development and operations SRE teams in defining and implementing improvements in service architecture, both current and future.

o You will be an expert at articulating technical characteristics of your services and the dependencies between services, and guide service Development teams to engineer and add SRE capabilities to the Oracle SaaS/ERP service portfolio.

o You will participate in feature design reviews to ensure Monitoring, Telemetry, Reliability, Automation, and Runtime Debuggability is represented as a first class, design time priority.

o You will provide technical leadership in defining software engineering patterns, practices, and coding standards focused on increasing reliability and resilience of Oracle SaaS/ERP services. You will deliver code artifacts (reusable components, plug-ins, blueprints, sample code, scripts and tooling, etc.) to streamline adoption by Service development.

* Operations Engineering - You will understand and be able to communicate the scale, capacity, security, performance attributes and requirements of the services you own. You are a Subject Matter Expert, able to understand and communicate every characteristic of your service stack, such as:

o Degradation and behavior under load of the services and their dependencies.

o End-to-end tuning needs, optimizing resource utilization, as load patterns fluctuate.

o Instrumentation and metrics that clearly describe the service behaviors.

o Scaling requirements and patterns.

o Resiliency and recoverability, ensuring that backup / restore and disaster recovery capabilities are implemented, tested and maintained.

* Technical Experts - You are the ultimate escalation point for complex or critical issues that have not yet been documented as SOPs for Level1 staff. You will usually get called in during major incidents as an SME, when the source of a problem is unclear. You will have the deep understanding of service topology and their dependencies required to troubleshoot issues and define mitigations.
* Incident Response - You will be the primary author of technical content for both customer and internal communication used throughout the incident response process, e.g. postmortem/root cause analysis, end-to-end repair item definition, fixes in production.
* Automation - You will have a clear understanding of automation and orchestration principles, and will be eager to automate, wherever and whenever the possibility arises, while simultaneously eliminating technical debt. Automation must be part of your DNA.
* Prevention - Using data-driven incident findings, you will work on solutions that will ultimately prevent the incident/problem from arising ever again, and interim solutions to more quickly resolve the problem next time.
* Service Requirements - You will provide direction and prioritization to service Product Management and Service Development teams to engineer and add premier SRE capabilities to the Oracle SaaS/ERP services.
* SRE Testing Requirements - You will partner with Service Assurance and Test Engineering to ensure that monitoring, security, resiliency, scale, and performance are represented as a first class, testing priority in the overall testing strategy.

o You will ensure that the testing requirements include both positive and negative testing techniques for these SRE areas.

o You will ensure that testing requirements and the requirement backlog are curated on an ongoing basis using data driven methods that leverage fleet and customer feedback (eg. SRs, Bugs/ERs) and that these insights are translated into testing changes/investments.

* Evangelize and Educate - You will play a critical role in making the transformational culture change to an SRE mindset within Service Development. You will be responsible for evangelizing and educating Service Product Management and Service Development on the service centric, full stack approach and principles of SRE as well as the architectures and solutions used for Oracle SaaS/ERP services.

Skills and Qualifications

* Minimum of 5 years of software development, with demonstrated knowledge of professional software engineering best practices for the full software development life cycle, including coding standards, code reviews, source control, build and release processes, continuous deployment, and test suite development and maintenance.
* 2 years relevant experience deploying and running large scale online systems built on Cloud platforms such as Oracle Cloud, AWS, Azure, Google Cloud Platform, and/or OpenStack.
* Experience designing and implementing solutions for platform and application layer telemetry, monitoring, scalability, performance and reliability.
* Experience coordinating resources across diverse teams to restore service and maintain SLA's; ITIL certification is preferred.
* Excellent written and verbal technical communications with technical and non-technical peers, customers, and at times, executive leadership.
* Proven success in contributing in a collaborative, team-oriented environment, with the ability to establish and nurture relationships between multiple teams and navigate dependencies.
* 3 years of experience

o Working in systems and network administration, application security, DevOps and/or Site Reliability Engineering.

o Hands-on with web protocols and Linux/Unix tools and architecture, from kernel to shell, file systems, and client-server protocols.

o Using C#, PowerShell/Shell script, ASP.NET/MVC, JavaScript, TypeScript, React, or T-SQL.

o Maintaining, analyzing, and troubleshooting large-scale distributed services

o Building automated tools in Python, Java, GoLang, and/or Ruby.

* Experience with monitoring alerting using technologies like Prometheus, Sensu, Nagios, Kafka, Wavefront, BigPanda, DataDog, and/or PagerDuty.
* Experience implementing, designing, deploying: Docker, Kubernetes, and Serverless (Lambda's).
* Experience with Oracle Linux, RedHat Linux, Ubuntu, Centos, CoreOS, and/or Amazon Linux.
* Experience with one or more orchestration, deployment tools, e.g. CloudFormation, Terraform, Ansible, Packer, and/or Chef.
* Experience with one or more CI tools: Jenkins, TeamCity, Bamboo, Artifactory.
* Experience with configuration management systems such as Ansible, Chef, or Puppet.
* Experience with Agile software development practices.
* Knowledge of testing methodologies, the testing pyramid (i.e., Unit, Integration, UI, E2E, etc.), testing frameworks, and testing automation tools like QTP, OATS, and Selenium.
* Self-driven to keep moving things forward even in the face of ambiguity and imperfect knowledge (resilient to hazards of "analysis paralysis").
* BS in Computer Science or related field and 10 years relevant experience.

About Oracle

Oracle is a company providing integrated cloud applications and platform services.

Headquarters
Size
10001 employees
Oracle

500 oracle parkway

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.