Job Directory Hadoop Engineer

Hadoop Engineer
Somerville, MA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.

About

Job Description

As a not-for-profit organization, Partners HealthCare is committed to supporting patient care, research, teaching, and service to the community by leading innovation across our system. Founded by Brigham and Women's Hospital and Massachusetts General Hospital, Partners HealthCare supports a complete continuum of care including community and specialty hospitals, a managed care organization, a physician network, community health centers, home care and other health-related entities. Several of our hospitals are teaching affiliates of Harvard Medical School, and our system is a national leader in biomedical research.

We're focused on a people-first culture for our system's patients and our professional family. That's why we provide our employees with more ways to achieve their potential. Partners HealthCare is committed to aligning our employees' personal aspirations with projects that match their capabilities and creating a culture that empowers our managers to become trusted mentors. We support each member of our team to own their personal development-and we recognize success at every step.

Our employees use the Partners HealthCare values to govern decisions, actions and behaviors. These values guide how we get our work done: Patients, Affordability, Accountability & Service Commitment, Decisiveness, Innovation & Thoughtful Risk; and how we treat each other: Diversity & Inclusion, Integrity & Respect, Learning, Continuous Improvement & Personal Growth, Teamwork & Collaboration.

Overview

* We are looking for a self-motivated Data Engineer to join our data engineering team.
* Design, Develop, construct, test and maintain architectures such as Data Lake, large-scale data processing systems
* Big data ecosystem related Tool selection and POC analysis
* Gather and process raw data at scale that meet functional / non-functional business requirements (including writing scripts, REST API calls, SQL Queries, etc.)
* Develop data set processes for data modeling, mining and production
* Integrate new data management technologies (Collibra, Informatica DQ..) and software engineering tools into existing structures
* The candidate will be responsible for participating in building new Data Lake, expanding and optimizing our data platform and data pipeline architecture, as well as optimizing data flow and collection for cross functional teams.
* The ideal candidate is an experienced data pipeline builder who enjoys optimizing data systems and building them from the ground up.
* The Data Engineer will support our Software Developers, Database Architects, Data Analysts and Data Scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout ongoing projects.
* They must be self-directed and comfortable supporting the data needs of multiple teams, systems and products.
* The right candidate will be excited by the prospect of optimizing and/or re-designing our data architecture to support next generation of products and data initiatives.Principal Duties and Responsibilities
* Create and maintain optimal data pipeline architecture, assemble large, complex data sets that meet functional / non-functional business requirements on Hadoop and relational data systems
* Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, etc.
* Build the Hadoop infrastructure required for optimal extraction, transformation, and loading of data from traditional/legacy data sources.
* Work with stakeholders including the Management team, Product owners, and Architecture teams to assist with data-related technical issues and support their data infrastructure needs.
* Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
* 3-5 Years of experience building and optimizing 'big data' data pipelines, architectures and data sets.
* Advanced hands-on SQL knowledge and experience working with relational databases for data querying and retrieval.
* Experience with big data frameworks/tools: Hadoop, Kafka, Spark, etc.
* Experience with relational SQL and NoSQL databases, including MS SQL Server, Hive, HBase.
* Experience performing root cause analysis on data and processes to answer specific business questions and identify opportunities for improvement.
* Experience with data security.
* Experience with building processes supporting data transformation, data structures, metadata, dependency and workload management.
* Experience supporting and working with cross-functional teams in a dynamic environment.

Experience with Java and/or Python a plus.

* Hadoop based technologies (e.g. hdfs, Spark). Spark Experience is must
* Strong SQL skills on multiple platform (preferred MPP systems)
* Database Architectures
* Data Modeling tools (e.g. Erwin, Visio)
* 5 years of Programming experience in Python, and/or Java
* Experience with Continuous integration and deployment
* Strong Unix/Linux skills
* Experience in petabyte scale data environments and integration of data from multiple diverse sources
* Kafka, Cloud computing - Azure, AWS; machine learning, text analysis, NLP & Web development experience is a plus
* Healthcare experience, most notably in Clinical data, Epic, Payer data and reference data is a plus but not mandatory

Skills Required

* Expertise in the Hadoop Data Lake and relational Data Warehouse platforms
* Demonstrated experience in Hadoop big data technologies (Cloudera, Hortonworks), Data Lake development
* Experience with real time data processing and analytics products
* Experience with at least one Cloud based data technologies (AWS, Azure, GCP)
* Cloudera or Hortonworks certification preferred
* Cloud certification would be preferred
* Large data warehousing environments in at least two database platforms (Oracle, SQL Server, DB2, etc)
* Programming experience in Python, Java, SQL, good to have .Net, C#
* ETL, data processing expertise in Hadoop (map-reduce, spark, sqoop) and SSIS, HealthCatalyst, Informatica
* Familiarity with data governance and data quality principles, good to have experience with data quality tools
* Ability to independently troubleshoot and performance tune in large scale data lake, enterprise systems
* Knowledge of data architecture principles, data warehousing, agile development, DevOps methodologies
* Understanding of change management techniques, and the ability to apply them
* Excellent verbal and written communication skills, problem solving and negotiation skills
* Act as an effective, collaborative team member

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.