MongoDB is growing rapidly and seeking a Data Engineer to be a key contributor to the overall internal data platform at MongoDB. You will build data-driven solutions to help drive MongoDB's growth as a product and as a company. You will take on complex data-related problems using very diverse data sets.
Lead Data Engineers provide vision and expertise to help drive the MongoDB internal data engineering platform. On the Data Engineering team, you will help lead in making high-level decisions about how we evolve the platform in a way that will best help us achieve our key results. We are solving very challenging problems of scale and as the business grows, the data will grow to present us with an even bigger challenge.
In addition to the below technical requirements, we expect a lead who:
* Lead in an agile team environment with a close relationship with internal data lake users
* Develop the team, and encourage agility, such as incrementally shipping complex projects
* Provides technical leadership to evaluate potential solutions, determine the impact of the solution, recommend courses of action, and finally design and implement the solution
* Recommends data engineering best practices that are recognized in the industry as such
* Provides technical guidance to ensure tasks are completed within established timelines, having established realistic timelines during the planning process
* Recognizes and adopts best practices in SDLC and Agile
* Leads the improvement of ongoing data engineering processes, automating or simplifying self-service support for datasets
* Enjoys learning and sharing their vision and expertise with others in an open, collaborative environment.
You have experience with:
* Leading a team of data engineers
* Several programming languages (Python, Scala, Java, etc..)
* Data processing frameworks like Spark
* Streaming data processing frameworks like Kafka, KSQ, and Spark Streaming
* A diverse set of databases like MongoDB, Cassandra, Redshift, Postgres, etc…
* Different storage formats like Parquet, Avro, Arrow, and JSON
* AWS services such as EMR, Lambda, S3, Athena, Glue, IAM, RDS, etc..
* Orchestration tools such as Airflow, Luiji, Azkaban, Cask, etc…
* Git and Github
* CI/CD Pipelines
* Enjoy wrangling huge amounts of data and exploring new data sets
* Value code simplicity and performance
* Obsess over data: everything needs to be accounted for and be thoroughly tested
* Plan effective data storage, security, sharing and publishing within the organization
* Are constantly thinking of ways to squeeze better performance out of the pipelines
* You are deeply familiar with Spark and/or Hive
* You have expert experience with Airflow
* Understand the differences between different storage format like Parquet, Avro, Arrow, and JSON
* Understand the tradeoffs between different schema designs like normalization vs denormalization
* In addition to data pipelines, you're also quite good with Kubernetes, Drone, and Terraform
* You've built end-to-end, production-grade data solutions that run on AWS
* Have experience building ML pipelines using tools like SparkML, Tensorflow, Scikit-Learn, etc..
MongoDB is a company developing open-source document and NoSQL databases.