Successful candidate will be responsible for driving high quality and reliability of equipment used in Microsoft's cloud to meet and exceed our customers' expectations. Act as the internal consultant on all reliability matters and interface with program management, vendors and design engineering (as necessary) on key reliability programs/issues. This will include creation or revision of reliability engineering guidelines to improve product field performance through design enhancements to meet reliability goals. Uses principles of performance evaluation and prediction to improve the reliability and maintainability of Cloud Infrastructure servers, including PCBA (printed-circuit-board-assembly). Identifies, collects, analyzes, and manages various types of data to minimize failures and improve product performance. Use scripting and real time capture responses from electronic devices under test (DUTs) to determine proper operation of the DUT or fault trace/root-cause. Works cross-functionally to resolve reliability problems that result in excessive field failures.
In addition, candidate's will use Microsoft cloud performance IoT(Internet of Things)/telemetry data and traditional reliability engineering principles, to determine and predict the reliability for critical commodities/parts. The successful candidate should be considered as an expert in her or his technical field as well as having a proven track record of success. The candidate must demonstrate a detailed understanding of the significance of time to market, risk mitigation, contingency plans, return on investment, etc.
* Minimum B.S. in Electrical Engineering, Computer Engineering, Physics or Material Science
* 8+ years of experience in hardware development designing moderate to complex hardware products
* Managing multiple design qualification activities and development schedules
* Ability to communicate, collaborate and lead cross-functionally to resolve issues, including with customers
* Excellent verbal/written communication, interpersonal and computer skills required
Microsoft is a highly innovative company that collaborates across disciplines to produce cutting edge cloud technology that changes our world. The Cloud Server Infrastructure (CSI) team in Microsoft's Azure C+E division is responsible for delivering server infrastructure for Microsoft's online services. The hardware for operating these services (over 200 and counting), comprises of hundreds of thousands of servers spread globally and applications that reach hundreds of millions of users every day. Our customer-base is growing rapidly, our infrastructure investments are multiplying, and the size of our global infrastructure is increasing by the day - along with the scale of our challenges. Learn more about our team and projects here Azure Hardware Infrastructure
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Responsibilities include - the candidate should possess most (or all) of the following capabilities:
* Understanding of Design for Reliability principles and Physics-of-Failure concepts to develop and implement accelerated tests to identify and mitigate risks and qualify engineering designs during product development
* Ability to use knowledge of product design and manufacturing processes to conduct Failure Modes and Effects Analysis (FMEA)
* Capable of applying Design-of-Experiments concepts to identify Critical-to-Quality parameters and develop robust evaluation plans based on them
* Knowledge of acceleration models for common failure mechanisms and stress types
* Knowledge of statistical techniques to analyze test data and create estimates for field failure rates
* Good understanding of fundamental properties and characteristics of materials used in cutting-edge consumer electronics products
* Familiarity with application of simulation tools like Finite Element Analysis, etc. to evaluate product performance (mechanical, electrical and thermal) is useful
* Experience with balancing the significance of time to market, risk mitigation plans and return on investment while creating and executing reliability plans during product development
* Completing measurement method analysis, gage correlation studies (GRR) and other data fidelity studies
* Completing data trend and variation analysis and creating engineering reports
* Summarizing the test data and perform data analysis in accordance with product design requirements
* Preparing test reports and communicating results and findings to Reliability engineer to facilitate root cause analysis and resolution when failure occurs
* Ability to develop reliability stress hardware specifications and procedures, writing test programs for reliability testing and characterization, and assisting in selection of new reliability lab equipment
* Ability to set-up test equipment for functional tests for subsystems (like PCBA) and system levels, including product cabling, instrumented thermal and mechanical characterizations such as thermal couples, accelerometers, strain gages, power supplies, and data acquisition systems
* Develop and execute reliability qualification plans based on product lifecycle requirements while interfacing with design and manufacturing partners and external suppliers
* Develop, with other functional disciplines, customer usage models and translate understanding of the customer into practical reliability test specifications
* Standardize methodologies and processes for increased effectiveness of qualification plans and sample sizes used
* Participate in component vendor selection activity and drive component qualification activity for components that are critical to Microsoft product requirements
* Use knowledge of manufacturing process capability as well as system-level performance requirements to establish Critical-to-Reliability performance metrics
* Monitor product performance in the field, understand customer-facing product issues and drive failure analysis and corrective action with the appropriate partner engineering teams
* Strong working knowledge of PCBA (printed circuit board assembly) and electronic component failure mechanisms
* Strong familiarity with industry standards, IPC, JEDEC, Telcordia, and MIL. standards
* Strong working knowledge with life test, ALT, HALT and HASS design and execution
* Knowledge of manufacturing methods for electronic components
Microsoft develops, licenses, and supports software, services, devices, and solutions.