About
Job Description
Data Engineer, Ingestion
Data Acquisition & Ingestion Team Overview
The Data Acquisition & Ingestion team is responsible for collecting, ingesting, and normalizing the data that powers Helio. We are building an automated ingestion system that is able to scalably and reliably onboard and extract value from hundreds of disparate data sources. We are also champions of maintaining high quality data practices across all of Helio's data pipelines.
Some of the technologies we leverage are Python 3, PySpark, AWS, Docker, Kubernetes, Postgres, Airflow, Jenkins and Git.
We are looking for a Senior Data Engineer to help us design and develop data pipelines to ingest, validate, extract, and normalize data across new and existing sources. The ideal candidate will be self-driven and comfortable balancing progress towards a longer-term roadmap while maintaining context and stability across a dynamic set of existing data sources. While the role is for an individual contributor, we are looking for someone who is excited and willing to mentor junior engineers.
Responsibilities
Provide senior-level contribution to the design, implementation and maintenance of complex data pipelines
Build reliable services for gathering & ingesting data from a wide variety of sources
Build performant and reliable data pipelines to validate, extract and normalize data from a wide variety of sourcesDevelop strategy, tools, and workflow for integrations and ingestion of data
Collaborate with cross-functional teams and stakeholders to understand data needs
Write quality, maintainable code with extensive test coverage in a fast-paced, agile software engineering environment
Mentor junior teammates and lead by example in demonstrating software engineering best practices
Requirements
Hold a B.S. or M.S. in Computer Science, or equivalent degree
5-7+ years of proven working experience as a data engineer
Excellent software engineering skills and strong fundamentals in algorithms, data structures, predictive modeling and big data concepts
Strong programming fundamentals and proficiency in an object-oriented language such as Python or Scala
Excellent communication skills to collaborate with stakeholders in engineering, data science, and product
Nice to Have
Experience with our stack (Python, PySpark, Airflow, AWS ecosystem) is preferred but not required
Experience building large-scale and complex data processing pipelinesA successful history of manipulating, processing and extracting value from large disconnected datasets
Strong analytical skills related to working with unstructured datasets
Useful Traits for this Role
Communication, both technical and business-level, especially with external contractors
Detail-oriented, Business-sense and ability to manage ambiguity; able to synthesize detailed schema specifications from a newly identified source
Ability to understand, maintain, document, and be knowledgeable about a large variety of data sources; able to deal with a certain level of reactive context-switching
Proactive and driven; will identify gaps in our data model and will proactively work to improve it