We are building a reliable platform for Pinterest engineers to produce and consume signals about pins, boards, and users. These signals power critical Pinterest product features, such as home feed, search, recommendations, ads, and trust & safety. The signal platform is also the foundation for our unified ML feature store. It provides a uniform way to access batch, incremental, and real-time generated data, and complete data governance capabilities: quality and freshness monitoring, lineage tracking, ownership, cost attribution and lifecycle management.
What you'll do:
* Own, improve, and scale our batch signal platform which produces 150+ critical signals and processes hundreds of TB of data each day, including the core Pinterest index of pins, boards and users. * Drive the roadmap for the next-generation real-time signal platform and build the system to incrementally update signals in real time
What we're looking for:
* Expertise in real-time/streaming data processing (e.g. Apache Flink) and batch processing (e.g. Hadoop map-reduce) at consumer Internet scale, Java, Python * Knowledge of distributed systems and large-scale online serving architecture * Strong ability to communicate and coordinate efforts across teams
#LI-KL1
About Pinterest
Pinterest is a visual bookmarking tool for saving and discovering creative ideas.