Job Directory Data Engineer

Data Engineer
San Francisco, CA

Companies like
are looking for tech talent like you.

On Hired, employers apply to you with up-front salaries.
Sign up to start matching for free.


Job Description

Who we are:

Calico is a research and development company whose mission is to harness advanced technologies to increase our understanding of the biology that controls lifespan, and to devise interventions that enable people to lead longer and healthier lives. Executing on this mission will require an unprecedented level of interdisciplinary effort and a long-term focus for which funding is already in place.

Position Description:

Great software engineering and data science are increasingly crucial to biology. We are in the midst of an explosion in the quantity and quality of biological and medical data that are transformative to our understanding of biology and disease. But the tools to store, process, and analyze these data are often primitive, and in some cases don't yet exist. Calico is seeking an exceptional data engineer to join our computing group and be a part of changing that story.

In this role, you will work closely with computational and research scientists to define strategies and implement systems for modeling, collecting, storing, and accessing diverse scientific data and metadata. Collaborating with other scientists and engineers, you will design, build, and maintain databases and data warehouses that underpin our scientific endeavors and accelerate our ability to ask new, sophisticated questions spanning multiple organisms, data modalities, and timescales. You will not only build tools to support existing scientific workflows, but also help set the vision for future data generation and collection efforts.

If you are passionate about data, passionate about biology, and passionate about their intersection-this is the job for you.

What you'll do:

* Work with computational and research scientists to understand common analysis use cases and data access needs.
* Design strategies for data storage and integration across different data sources (both internal and external) for multiple use cases.
* Implement, document, and maintain processing pipelines, databases, and data warehouse infrastructure.
* Work closely with full-stack engineers to develop APIs and GUIs for accessing and visualizing scientific data.
* Set data engineering vision and drive both independent and collaborative software development projects end-to-end.
* Contribute to a range of projects, from one-off solutions to long-term, complex systems.
* Build out core infrastructure, tooling, and software development processes.

Position requirements:

* 5+ years working with contemporary ETL tools and frameworks.
* 3+ years building Python-based backend systems.
* Fluent knowledge of SQL.
* Experience implementing RESTful APIs, GraphQL, and other programmatic interfaces to complex multidimensional data.
* Experience deploying high-performance data backends in the cloud with Amazon Web Services, Heroku, Google Cloud Platform, or a similar service.
* Firm grasp on software testing and test-driven development.
* Demonstrated success in owning projects end-to-end, including working with non-technical stakeholders to define requirements and seek feedback.

Nice to have:

* Worked with machine learning tools and infrastructure, e.g. TensorFlow and PyTorch.
* Built back-ends for high-dimensional graph or network data.
* Worked in biology or life sciences, and have familiarity with databases and data types used by computational biologists.
* Built software with technologies like ElasticSearch, GraphQL, and Google Cloud Platform.

Some projects you may contribute to:

* Data warehouse-a system to extract, transform, and load public and private datasets into a single repository, then making these data available for analysis visually with either off-the-shelf or custom-built GUIs.
* Exploratory data visualization & analysis tools-apps to help scientists explore and understand diverse, complex, and multidimensional data.
* Data platform-a modern, React (front-end) and Python (back-end) application that our scientists use to manage and process experimental data.
* Automation-software to ingest and transform data from custom high-throughput instrumentation.

Let your dream job find you.

Sign up to start matching with top companies. It’s fast and free.