Principal Supercomputing Operations Engineering Manager

Microsoft Corporation
Full-time Multiple Locations, United States other-general
Posted:
February 28, 2026
Location:
Multiple Locations, United States, United States

Job Description

**Overview**

Microsoft Azure’s Artificial Intelligence and High Performance Computing (AI/HPC) organization powers some of the world’s largest cloud native supercomputers used for frontier AI training, scientific computing, and large scale distributed simulations. Our team builds and operates hyperscale GPU clusters that consistently place Azure among global leaders in the Top500, MLPerf, and Graph500 benchmarks. By joining us, you step into the engineering core responsible for ensuring these systems remain reliable, performant, and ready for the next wave of AI innovation.

At this scale, interconnect fabrics are a first order reliability system that directly determines GPU availability, training throughput, and customer SLAs. As a Principal Supercomputing Operations Engineering Manager, you own the operational strategy and organizational execution for interconnect fabric reliability across flagship AI supercomputing environments. You lead teams that operate InfiniBan...

Apply for this Job

Submit your application for the Principal Supercomputing Operations Engineering Manager position at Microsoft Corporation.

Apply Now Save for Later

Job Overview

Job Type: Full-time
Location: Multiple Locations, United States
Posted: February 28, 2026
Deadline: March 08, 2026