Integrity: Doing the right things for the right reasons
Agility: Adapting and thriving in a dynamic environment
Teamwork: Combining our strengths to do amazing things
Passion: Channeling enthusiasm to drive excellence
Creativity: Unleashing curiosity to defy the norm
About the role:
As a Software Data Engineer at 1010data, you will be responsible for designing, maintaining, and optimizing large-scale automated ELT processes. Working actively with data scientists and analysts specializing in enterprise data warehousing, you will leverage industry-standard data orchestration tools as well as in-house proprietary scheduling and automation tools to create efficient and reliable ELT jobs which support 1010data's product offerings and data warehousing needs for our customers. As we incorporate more cloud technologies into our processes, you will be at the forefront of exploring and defining best practices, and helping us transition our products to be more scalable.
As part of the onboarding process, you will learn about 1010data's proprietary technology stack. Our query engine, query language, database, and data storage layer were all developed and fine-tuned in-house over the lifetime of the company. ELT processes heavily rely on these components, whether they are written in Python and Airflow , K, or our proprietary data orchestration tools. You will be formally trained in the latter as a new 1010data employee. The concepts should be familiar to anyone with exposure to database techniques like normalization/indexing/partitioning, MapReduce, columnar database architecture and distributed systems.
What you will take on:
* Taking end-to-end ownership of data products and custom solutions for our clients
* Coordinating with the systems, core, data science, and analytics teams to build and maintain data products and custom solutions for our clients
* Designing and writing automated scripts to preprocess terabytes of data from our partners/clients
* Designing and writing new enterprise-scale ELT/ETL workflows from scratch in Python using Airflow, Docker, Kubernetes, AWS, etc.
* Modifying/redesigning legacy ELT/ETL processes to leverage cutting-edge open source and proprietary technologies
* Ensuring quality, reliability and uptime for critical automated processes
* Migrating our products and processes into the cloud while drastically reducing our in-house data center footprint
What you already have:
* At least 1-2 years of professional experience programming in Python
* Exposure to ETL/ELT pipeline automation
* Exposure to basic database concepts
* Good understanding of Data Engineering, NoSQL databases and database design, distributed systems and/or information retrieval
* Knowledge of Apache Airflow
* Familiarity with functional/vector programming
* DBA experience
* Ability to plan and collect requirements for projects, and interact with the analyst and data science teams
* STEM Bachelor's required, graduate degree is a big plus
1010data travels at the speed of thought to make Big Data discovery easy; we power sub-second responses to analyses run on billions of rows of data. 1010data is defining the way the world interacts with data. Come be a part of it. Come do powerful things with data.
An essential tool to more than 850 of the world's top retail, manufacturing, telecom, government and financial services enterprises including The New York Stock Exchange, Dollar General, P&G, and RiteAid; the 1010data platform is a highly differentiated product that is becoming the industry standard for Big Data Discovery and Data Sharing.
With more than 30 trillion rows of data in our private cloud, 1010data is designed to scale to the largest volumes of granular data, the most disparate and varied data sets, and the most complex advanced analytics. All while delivering lightning-quick system performance.
1010data is an equal opportunity employer. We embrace humans of every background, appearance, race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, and disability status.
1010data is a company developing a cloud-based platform for big data discovery and data sharing.