Job Directory Uber Infrastructure Software Engineer - Reliability Platform

Infrastructure Software Engineer - Reliability Platform Uber
San Francisco, CA

Uber is a provider of a mobile application connecting passengers with drivers for hire.

Companies like Uber
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About Uber

Job Description

Uber Overview

At Uber, we ignite opportunity by setting the world in motion. We take on big problems to help drivers, riders, delivery partners, and eaters get moving in more than 600 cities around the world.

We welcome people from all backgrounds who seek the opportunity to help build a future where everyone and everything can move independently. If you have the curiosity, passion, and collaborative spirit, work with us, and let's move the world forward, together.

Job Description

About the Role

Reliability Platform is a mission within Uber Infrastructure that is charged with measuring, monitoring, and providing tools for quick remediation of mission-critical service outages at Uber.

Reliability Platform teams develop and provide a portfolio of measurement, monitoring, tracing, logging, dependency comprehension, stress testing, and on-call experience platforms to keep Uber's products highly reliable and scalable, and to provide extreme leverage to software developers across Uber.

With teams in San Francisco, New York City, and Europe, the Reliability Platform team develops and provides a portfolio of measurement, monitoring, tracing, logging, dependency comprehension, stress test, and on-call experience platforms to keep Uber's products highly reliable and scalable, and to provide extreme leverage to software developers across Uber.

Reliability Platform is seeking an experienced back-end engineer to join our mission-critical Blackbox and Hailstorm team.

What You'll Do / What You'll Need / Bonus Points / About the Team

About Blackbox And Hailstorm

There are two major products that this team owns and develops.

Blackbox monitoring. This is a platform for running test cases which simulate Uber's core business flows, hitting the same external endpoints that our client-facing applications use. We build and operate this world-class external monitoring system deployed independently of Uber's infrastructure on multiple cloud providers (AWS, GCP). Blackbox is often the first system to detect major outages, and is invaluable to the company as it moves fast in bringing new features to market.

Now, we're expanding our scope to active probing as a platform within Uber's production zones. This will unlock a whole wave of new capability for engineers at Uber to monitor their systems for correct behavior, and add to the team's positive impact on the business.

Hailstorm. This is a platform that guarantees that Uber can continue to scale at a fantastic pace. It runs thousands of integration and load tests against individual services at Uber. It also generates hundreds of thousands of simulated core business flows (rides, EATS orders, etch.) in Uber's production environment to stress the platform and measure the impact of that stress on our systems.

Hailstorm also plans to build a platform which can fully automate the load testing for both core trip flows and individual services by monitoring live production traffic, utilizing the forecast throughput as the target, and wisely throttling tests to mitigate the stress impact to our production environment.

Our software engineering efforts are just getting started, so come help create the platforms that will help all engineers at Uber and tackle these challenges with a tight-knit team of experienced engineers.

What You'll Need

* A passion for engineering, centered on reliability, testing and monitoring.
* A strong customer-focused mindset. We work directly with engineers throughout Uber to build a reliable and user-friendly platform they use to run business-critical tests.
* An emphasis on high quality, well-tested and readable code. We use Go, but experience with building large-scale back-end systems in languages like Java, C++, or Python is also good.
* BS/MS in Computer Science, a related technical discipline, or equivalent experience.

Nice To Have

* Experience building scalable, fault-tolerant, distributed systems.
* Experience with any/all of: Kubernetes, AWS, GCP, Elasticsearch, Consul, Terraform.