Site Reliability Engineer Q1/FY20
At Segment, we believe companies should be able to send their data wherever they want, whenever they want, with no fuss. Unfortunately, most product managers, analysts, and marketers spend too much time searching for the data they need, while engineers are stuck integrating the tools they want to use. Segment standardizes and streamlines data infrastructure with a single platform that collects, unifies, and sends data to hundreds of business tools with the flip of a switch. That way, our customers can focus on building amazing products and personalized messages for their customers, letting us take care of the complexities of processing their customer data reliably at scale. We're in the running to power the entire customer data ecosystem, and we need the best people to take the market.
Site Reliability Engineers (SRE) at Segment are members of the engineering team whose primary goal is to ensure the reliability, flexibility, and cost-effectiveness of our production infrastructure.
While these responsibilities are shared with the entire engineering team, SREs build and maintain the portions of our stack that ensure the entire team can confidently ship software day in and day out. They complement other engineers with their deep knowledge of the fundamental pieces of technology that underpin our production infrastructure. The SRE team are our in-house experts on building reliable, maintainable systems and they are responsible for setting the direction that determines how we go about constructing and deploying our production environment.
Here is an example of a high-impact project that one of our SREs, Achille, spearheaded to materially drive down our operational costs: https://segment.com/blog/the-million-dollar-eng-problem/
What we do:
* We build and maintain the fundamental infrastructure that runs Segment.
* We write software to automate, and introspect our production systems.
* We spend 50% of our time writing tools and backend systems.
* We share a 24x7 on-call rotation with the other engineers.
* We teach other teams to reliably and cost effectively operate and maintain their services.
* Take proactive steps to improve our availability, reliability and efficiency.
Who We Are Looking For:
* You want to write software to improve the operability, reliability and efficiency of Segment's production systems.
* You have a rare ability to inspire engineering teams to up their reliability game.
* Aim to dig into problems and burrow until the solution is found.
* A strong interest in correctness, automation, and efficiency.
Projects You Could Work On:
* Build software that improves the reliability, performance, and efficiency of Segment's high-throughput, large-scale SaaS platform.
* Collaborate with the whole engineering team on projects as the expert on reliability, performance, and efficiency.
* Automate away the process of handling capacity, safely deploying software, and mitigating failures.
* Troubleshoot and mitigate the thorniest problems in our most important systems. Advise the team during postmortems on effectively avoiding repeated incidents.
* Work with cutting edge technology, share with others through open source, and spread your expertise through contributions to our engineering blog.
* CS Degree and/or a demonstrable, solid understanding of CS fundamentals.
* Proficient software engineer: strong with at least one programming language, such as Go, or Python.
* Solid grasp of Linux systems.
* Excellent communicator; writes great documentation.
* Experience operating large-scale, distributed systems on top of cloud infrastructure such as Amazon Web Services (AWS) or Google Compute Platform (GCP).
* Broad understanding of the OS and of networking protocols with demonstrated ability to apply this understanding to solve real problems.
* Strong proficiency with OS tuning, kernel internals and expertise with application debugging tools.
* Experience with Container Orchestration like ECS, Kubernetes or Mesosphere.
* Strong sense of urgency and ownership over critical problem areas.
* Knowledge of networking protocols and network programming.
* Experience writing production quality code in Go.
* Experience with Datadog and SignalFx.
Equal opportunity statement
Segment is an equal opportunity employer. We believe that everyone should receive equal consideration and treatment in all terms and conditions of employment regardless of sex, gender (including pregnancy, childbirth, breastfeeding or related medical conditions), sexual orientation, gender identity, gender expression, race, color, religion, creed, national origin, ancestry, age (over 40), physical disability, mental disability, medical condition, genetic information, marital status, domestic partner status, military or veteran status, height, weight, AIDS/HIV status, and any other protected category under federal, state or local law. Pursuant to the San Francisco Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.
Segment is a company that develops a platform for collecting customer data.