Architect and operate high quality, large scale, multi-geo data pipelines that drive business decisions.
* Redesigned data pipelines using the applicable DBR features, and incorporating external tools where necessary to have better reliability and tighter SLAs.
* Established conventions or new APIs for logging feature usage for PM use-cases.
* Understandable SLAs for each of the production data pipelines.
* Improved test coverage (90+%) for data pipelines. Best practices and frameworks for unit, functional and integration tests.
* CI and deployment processes and best practices for the production data pipelines.
* Reduction in overall alert noise and increase responsiveness by rethinking the current alert categories and priorities.
* Design schemas for financial, sales and support data in the data warehouse.
* Experience building, shipping and operating multi-geo data pipelines at scale.
* Experience with working with and operating workflow or orchestration frameworks, including open source tools like Airflow and Luigi or commercial enterprise tools.
* Experience with large scale messaging systems like Kafka or RabbitMQ or commercial systems.
* Excellent communication (writing, conversation, presentation) skills, consensus builder
* Strong analytical and problem solving skills
* Passion for data engineering and for enabling others by making their data easier to access.
* Experience with pipelines that are used by many downstream teams, including non-engineering functions.
* Experience with streaming data frameworks like spark streaming, kafka streaming, Flink and similar tools a plus.
* Experience working with Apache Spark and data warehousing products.
* Direct experience with a log collection and aggregation system at scale.
* Demonstrated execution at a growth stage technology company.
* Medical, dental, vision
* 401k Retirement Plan
* Unlimited Paid Time Off
* Catered lunch (everyday), snacks, and drinks
* Gym reimbursement
* Employee referral bonus program
* Awesome coworkers
* Maternity and paternity plans
Databricks' mission is to accelerate innovation for its customers by unifying Data Science, Engineering and Business. Founded by the original creators of Apache Spark™, Databricks provides a Unified Analytics Platform for data science teams to collaborate with data engineering and lines of business to build data products. Users achieve faster time-to-value with Databricks by creating analytic workflows that go from ETL and interactive exploration to production. The company also makes it easier for its users to focus on their data by providing a fully managed, scalable, and secure cloud infrastructure that reduces operational complexity and total cost of ownership. Databricks, venture-backed by Andreessen Horowitz, NEA and Battery Ventures, among others, has a global customer base that includes Viacom, Shell, and HP. For more information, visit www.databricks.com.
Apache, Apache Spark and Spark are trademarks of the Apache Software Foundation.
Databricks is a company developing a unified data analytics platform.