PlanGrid is looking for a Senior Infrastructure Engineer to join our new Storage and Message Queue (SMQ) team.
The SMQ team ensures infrastructure uptime, provisioning, and configuration of datastores (postgres, mongo, redis, elasticsearch, S3) and queuing systems (SQS, RabbitMQ) for PlanGrid engineering teams. A big part of our job is enabling developers to have visibility into their service's performance by means of metrics, traces, and logging. Our team handles complex architectural projects to allow us to grow internationally, handle datacenter-level regional disaster recovery, and implement self-healing infrastructure across many AWS regions.
The SMQ team was formed to help focus on larger initiatives for the ever-growing Infrastructure Engineering umbrella. In the last year, we've re-architected the monolithic postgres database drastically reducing data loss and recovery times. We also lead efforts to reduce the operational burden provisioning and configuring new RDS databases with solid observability in place. We also test disaster recovery scenarios quarterly to verify backups and restores and expose weaknesses within our architecture.
We adhere to a devops methodology (as opposed to old-school operations) where developers -- not operations people -- are responsible for their code's reliable operation and where developers are empowered and trusted to make the changes necessary for reliability. Our work focuses on data stores and message queueing systems but widely touches every layer of infrastructure and development lifecycle. We love people who define success in terms of SLOs, SLIs, and SLAs, who care deeply about observability in distributed systems, who educate people on the intricacies of complex database architectures, and who have experience scaling out large database systems to multiple regions worldwide.
DevOps/SRE, database administrator, and systems experience are highly valued. If you've gotten your hands dirty with package and configuration management, securing cloud resources with best practices, infrastructure-as-code principles, kubernetes, Spinnaker, AWS cloud infrastructure, postgres user management, postgres replication strategies, rabbitMQ cluster provisioning, and know your way around docker, bash, and python, we'd love to talk with you!
You should be passionate about getting in front of problems instead of waiting until things are on fire. If you dream of stability, love metrics, communicate well, document your code, and love building reliable systems that hum along and take care of themselves, we want you on the team.
Our responsibilities include:
* Enable faster provisioning and configuration of data stores and message queue systems to speed up the developer teams building out new products
* Teach our developers about handling database migrations on deployments, caching-gotchas with redis, optimizing search queries for elasticsearch, and data loss scenarios with SQS vs RabbitMQ, and everything in between
* Automate infrastructure provisioning with Cloudformation/Terraform and Saltstack
* Build observability into every aspect of our production infrastructure enabling developer teams to manage their own storage and queuing systems
* Continually reduce RPO/RTO for our data stores and message queuing systems to enable our sales team to provide better guarantees for our customer's data
* Participate in on-call rotations and be a model of how to manage incidents
In your first 6 months on the SRE team, you will:
* Help build out new AWS regions and ensure observability and good RPO/RTO
* Automate our failover processes for our monolithic postgres database
* Research AWS Aurora as a replacement to our RDS and EC2 postgres databases
* Research and design new systems to enable developer teams to test application database load before reaching production
* Design and re-architect resilient and observable message queuing systems
* Help educate developers on writing efficient SQL queries for large web applications
* Migrate away from our legacy Mongo cluster to enable better RTO/RPO metrics
PlanGrid, an Autodesk company, builds simple, beautiful software that construction teams love to use. As part of Autodesk Construction Solutions (ACS), whose mission is to seamlessly connect the office, trailer and the field across the entire construction project lifecycle, PlanGrid's mobile-first solutions empower general contractors, subcontractors, owners and architects to provide fast, accurate information to the field. With unparalleled adoption by field workers, PlanGrid is used on projects as the single source of truth for all construction data - including drawings, photos, and other critical documents. As a result, critical workflows are streamlined, efficiency is improved and field teams can take on more work and get more done. PlanGrid's software and other Autodesk Construction Solutions products enable a complete data set to move seamlessly through each phase of a building's lifecycle - from design and preconstruction to construction, turnover, and operations. PlanGrid is used on more 1.5 million construction projects in 100+ countries.
Join us as PlanGrid and ACS advance Autodesk's leadership in construction.
As part of GDPR compliance procedures, we have posted our Recruiting Privacy Notice on our website. Please also note that the advertised position is an opportunity with Autodesk, Inc. (https://www.autodesk.com/), as Autodesk recently acquired PlanGrid. Processing of your personal information as part of the job application process, and as part of Autodesk employment should a candidate be hired, will be handled by Autodesk pursuant to Autodesk's Candidate Privacy Statement, available at: https://damassets.autodesk.net/content/dam/autodesk/www/content/careers/autodesk_candidate_privacy_statement.pdf.
PlanGrid is a cloud-based app that allows users to store blueprints and construction documents on iPad and iPhone.