Job Directory Site Reliability Engineer

Site Reliability Engineer
Woodland Hills, CA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About

Job Description

Responsibilities:

* Create and maintain a continuous testing framework that observes and records and trends real time availability data for all of our clients
* Develop and maintain on premise and cloud capacity plans that ensure we are delivering a BlackLine service that is performant and cost effective
* Collaborate with development and other technology teams on requirements definition, capacity planning, and process refinement
* Improve the BlackLine SaaS service experience by discovering and highlighting optimization opportunities with existing code to address application availability, performance, observability, efficiency, and security challenges.
* Develop tools and systems to automate the identification, analysis, and remediation of application events, infrastructure issues, or requests.
* Establish and maintain Key Performance Indicators for the overall health of the service and build tools to exercise and evaluate if these KPI's are being met.
* Works cross-functionally with other teams to surface common pain points, architect solutions, establish conventions, and evangelize application development and operations best practices.
* Transform discoveries into requests to others or action items for you and your team.
* Regularly learn new systems and tools as the BlackLine platform and ecosystem evolves.
* Own and evolve the BlackLine trust site to include real time availability and performance information
* Contribute knowledge, skills, and personal qualities to a dedicated team of top engineers solving real-life problems in a bleeding-edge, high-performance, and high-traffic environment.
* Assessing, testing, tracking, predicting, and reporting all related performance aspects of a suite of production applications from a performance, responsiveness, capacity, and availability perspective.
* Publish performance result findings, conclusions, recommendations
* Create second tier level analysis of capacity constraint points and performance and discuss with development teams/infrastructure
* Support integration of performance data into customer experience analytics tools and reporting
* Ensure application and infrastructure capacity management efforts have verifiable capacity data to support business cases
* Monitor industry trends and keep abreast of new tools and technologies.
* Participate in our on-call rotation and conduct incident reviews
* Other duties as assigned

Requirements:

* BS or MS in Computer Science (or equivalent diploma and/or certifications) with 3-5 years related experience.
* Intermediate to advanced knowledge of at least one of the following programming languages: C#, Visual Basic, PowerShell, Java, Go, Linux Shell, Ruby.
* Demonstrated history of developing or operating production web applications and solid understanding of HTTP(S), HTML, JavaScript, CSS, and XML.
* Knowledge of software development best practices and SDLC.
* Experience deploying high availability systems and software.
* Experience with troubleshooting distributed web applications in a production environment.
* Intermediate level knowledge of IIS and Windows Server or Linux and Apache.
* Experience with infrastructure as a code and platform as a service.
* Experience with configuration management tools Ex Chef, Ansible, Puppet.
* Must possess the ability to handle multiple goals concurrently and function in a fast-paced, demanding, ever changing high growth environment
* Must maintain the highest level of integrity, courtesy and respect while interacting with internal customers, employees and business contacts
* Excellent oral and written communication skills
* Ability to interface with internal technical experts using professional interpersonal skills
* Experience in analyzing datasets to draw conclusions, and graph datasets supporting these conclusions
* Exhibit creative problem-solving, logical troubleshooting and analytical skills
* Basic level proficiency in application load balancing methods (F5 LTM, Windows NLB, etc.)
* Working knowledge of TCP/IP and networking concepts
* Proficiency with statistical concepts; confidence interval, hypothesis testing, sampling
* Operating systems concepts such as CPU, memory, disk queues and graphing/analyzing these over time
* Must possess strong organizational skills and be able to work with minimal oversight
* Ability to understand new technologies quickly and adapt these into daily work and goals

Preferred Requirements:

* Prior C#, ASP.NET, Ruby, Go or Java development experience, preferably in an agile SaaS environment
* Significant experience with open source platforms and technologies.
* Experience with software development processes and methodologies
* Track record of architecting, developing, implementing robust, distributed online solutions

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.