Skip to main content
Tallo logoTallo logo
Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Principal Group Software Engineering Manager

Job

Microsoft

Redmond, WA (In Person)

Full-Time

Posted 2 weeks ago (Updated 1 day ago) • Actively hiring

Expires 7/4/2026

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Design and scale capacity intake, planning, and deployment reducing models time‑to‑production and meeting SLAs (service level agreement) for priority workloads through automation and data‑driven operations. Build a unified control plane that connects intake, planning, deployment, and fleet operations, enabling global optimization across cost, latency, compliance, and flexible model scaling (0→1 platform ownership). Build and lead a high-performing organization of engineering managers and senior engineers across capacity buildouts/automation, capacity planning, and the control plane. Set the strategy and roadmap for Copilot capacity management and the control plane. Drive execution across existing teams today, with a clear plan to grow the org as control plane scope expands. Partner deeply with Copilot, AI Core, Azure to align demand, supply, and COGs (cost of goods sold) for Copilot workloads. Own live-site, reliability, and operational excellence for the capacity surface area. Establish metrics and SLAs for intake latency, fleet utilization, automation coverage, and time-to-deploy; use them to guide investment decisions. Coach and grow managers and senior ICs (individual contributor); raise the engineering bar; recruit experienced platform leaders into the team. Represent capacity in executive reviews and cross-org leadership forums; communicate trade-offs between cost, speed, and reliability with clarity. Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python These requirements include but are not limited to the following specialized security screenings: Master's Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor's Degree in Computer Science or related technical field AND 15+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience. 6+ years people management experience. Experience as a manager of managers leading distributed-systems or platform engineering teams at scale. Demonstrated success building and operating large-scale distributed systems, control planes, orchestration platforms, or cloud infrastructure. Track record of taking a platform from concept to broad production adoption — design, staffing, execution, and live-site ownership. Systems thinking; able to identify and remove bottlenecks across intake, planning, scheduling, deployment, and operations. Experience driving multi-org programs and influencing partner teams without direct authority. Ability to translate ambiguous business needs into clear engineering strategy, priorities, and execution plans. Hiring, coaching, and people-development track record across multiple levels. Experience with large capacity fleets, AI/ML infrastructure, or large-scale inference or training systems. Experience with capacity planning, fleet management, or supply/demand optimization at hyperscale. Familiarity with Azure, M365, and AI workloads; understanding of inference and training cost models (COGS, utilization, throughput per GPU). Background building automation, control planes, or orchestration platforms from 0→1.