Azure Cosmos DB (cosmosdb.com) is Microsoft's next generation globally distributed, massively scalable, multi-model cloud database service. It is designed to enable developers to build planet-scale applications. Azure Cosmos DB is one of the fastest growing Azure services. Joining the Azure Cosmos DB team is a fantastic opportunity to work with highly talented engineers operating like a startup, and to deliver on our next set of big challenges.
We have several positions open across many service areas and different levels (L62+)
Monitoring & Customer Telemetry
Monitoring is at the heart of every online service. This team is responsible for building highly scalable and reliable Monitoring solution to deliver world class telemetry for our customers and to the service. With the rate at which the service is growing (500 Million Requests per min), the design of the system should scale to match the customer expectations to get lowest granular metrics. We have many technical challenges in our service health, monitoring and customer facing telemetry areas. It is critical for us to find right ways to measure rich metrics, provide tools/data for diagnostics, gain deep insights into service health and state, and establish the proper operation models to manage the system. These are among the top asks especially from our enterprise customers who want telemetry data for auditing, deeper analysis of their performance, modelling their growth on usage, etc. We have interesting challenges in how we maintain and expose telemetry at scale (petabytes of data), invest in auto analysis of large sets of our telemetry data, invest in anomaly detection systems, automation bots and build systems for SLA guarantee checks (eg. Consistency guarantees we make in the service). Monitoring team is responsible for designing end to end system across the stack and we are looking for full stack engineers to help us achieve the mission to build highly reliable and scalable monitoring solution in a COGS efficient manner. As a member of the team, you will be responsible for solving the challenges of handling Giga bytes of data per min and building passive and active health model around this.
Below are some of the challenges you get a chance to work with:
* Developing low latency and high granular metrics with very low COGS impact
* Building scalable pipeline to handles billions of customer requests
* Build a billing system which reliably bills every customer and scale with growth
* Build a highly reliable health model to monitor one of the fastest growing Azure service
* In this role, you will also learn about Azure Monitoring and Geneva system, and work on integration with Azure monitoring, Insights and Diagnostics like Azure advisory for the customer assets.
Performance is at the heart of every service, and the Performance team in Cosmos DB is looking for strong technical engineers to help us to continue to grow our service at a high rate, and build on performance guarantees that our customer love. We are looking for highly motivated and self-driven individuals that are passionate in the areas of product performance, resource governance, load balancing, COGs. This is the team that strives for higher performance levels, improving product performance to get less than single digit P99 latencies backed by SLAs on both Latency and Throughput, while also improving COGs for the service, and ensuring the service code continues to maintain a high performance bar.
The team's scope includes the following areas:
* Improving product performance, building automation to establish absolute performance levels and building sophisticated models to detect product performance regressions.
* Building automated analysis of performance related telemetry datasets to derive actionable insights to understand gaps and drive features and improvements.
* Define COGs model and operational principles to operate the service at optimum COGs, and drive high gross margin offers.
* Developing customer-facing offers & new features, as well as defining, measuring and improving on the performance throughput SLA and latency SLA.
* Building resource governance, load balancing features that ensure the products meet defined SLAs.
We are providing the world a scalable fault tolerant world-wide replicated database and massive scale-up compute. These systems are used solve the toughest financial, IoT, warehousing, AI and state-management problems along with many other solution areas such as gaming. These solutions capture the needs of the hobbyist developer to the Fortune 500 companies. Our job is to make sure these systems are secure and meet both the security requires of industry while also driving more defense in depth. We do this by building security features such as Encryption At Rest, Firewalls, better permissions management, broad mitigations, and also by driving the overall Security Development Lifecycle (SDL) process. This job requires a broad set of developer and program management skills. We are looking for experts in pen testing, secure feature design, SDL, security compliance certifications such as PCI-DSS & FIPS-150. If you have the passion for building and breaking massively scalable Cloud Infrastructure and Database systems (IaaS/DBaaS) this role is ideal for you.
5+ years of experience with coding in C, C++ and C#.
A Master's degree (or Bachelor's degree with 5+ years of work experience equivalent) in computer science or a related field.
At least 5+ years of experience building and shipping production software or services, experience in building or running systems at scale and interest in service fundamentals.
Experience Database Systems (SQL server, Cassandra, MongoDB, MySQL, PostgreSQL, Redis,etc. ) a plus
Experience in developing cloud services, service management and service operation tools.
Experience in developing and maintaining engineering systems and tools that support large development teams.
Experience working with large code base and complicated systems.
Experience using agile methodologies or test-driven development (TDD).
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
We are looking for experienced software engineers with:
Strong customer passion, accountability and drive who can take initiative and accomplish big goals.
An entrepreneurial spirit with a can-do attitude: Self-starter, project finisher and adaptable
Great communicator, able to analyze and clearly articulate complex issues and technologies understandably and engagingly.
Strong design and problem-solving skills, with a bias for designing at scale. Hands-on experience at shipping a large scale, commercial, online-software solutions.
Experience with multi-tenant services and resource isolation/governance areas of running a multi-tenant service is a plus
Microsoft develops, licenses, and supports software, services, devices, and solutions.