About
Job Description
Description
Who we are
SREs work on improving the availability, scalability, performance and reliability of Twitter's production services.
Twitter is looking for a Senior Site Reliability Engineer to join our Cloud Cache and Storage Infrastructure SRE team. Our team is dedicated to expanding our infrastructure, automation, and tooling for our Cloud Cache and Storage systems. Our team's mission is to provide safe, reliable and secure cache and core storage systems and to automate and operate these systems at scale.
What you'll do
* You will work in engineering team to design, build, and maintain cache layers, key-value, relational and binary file storage systems
* You will build automation and tooling in Python and other languages to manage our cache and storage services and their infrastructure
* You will perform deep dives into systemic and latent reliability issues, service performance, and capacity modeling
* You will troubleshoot issues across the entire stack: hardware, software, application and network,
* You will consult with customer teams on their service use patterns and identify anti-patterns and optimization strategies
* You will mentor SWEs on standard methodologies across multiple disciplines including proper service selection, monitoring, and troubleshooting complex code issues
* You will drive standardization efforts across the services, infrastructure, systems and practices
* You will develop new software-based solutions to infrastructure engineering problems
Who you are
* You have a solid understanding of systems and application design, including the operational trade-offs of various designs
* You have the knowledge of various aspects of service design: including messaging protocols & behavior, caching strategies and software design practices
* You have practical, solid knowledge of shell scripting and at least one higher-level language (Python or Ruby preferred)
* You have an expert understanding of Linux systems, services, optimization, storage subsystems, and file systems
* You have demonstrable knowledge of TCP/IP, HTTP, and experience supporting multi-tier application architectures
* You have a minimum 5 years experience handling services in a large scale environment
* You are able to prioritize tasks and work independently
* You have excellent written communication, interpersonal communication, and documentation skills
Desired
* Practical experience in Java or Scala
* Advanced knowledge of Python or Ruby to be able to build, write, and support complex services
* Ability to lead technical teams through design and implementation across an organization