Principal Researcher
Job
Microsoft
Redmond, WA (In Person)
$247,100 Salary, Full-Time
Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Principal Researcher Microsoft
- 4.2 Redmond, WA Job Details Full-time $163,000
- $331,200 a year 1 hour ago Qualifications GPU programming TensorFlow GPU architecture Doctoral degree Scientific publications PyTorch Technical documentation Quantization 3 years Master's degree C++ Bachelor's degree Algorithm design Prototype creation Participating in conferences Open source contribution Benchmarking Machine learning libraries Senior level AI Research & development Machine learning frameworks Prototypes Debugging Engineering validation Full Job Description Overview Generative AI is transforming how people create, collaborate, and communicate—redefining productivity across Microsoft 365 for customers worldwide.
- and long-term product innovation through close collaboration with research and product teams across the company.
- and kernel-level optimizations—exploring algorithmic, systems, and hardware/software co-design techniques.
For more see:
https://aka.ms/efficient-ai Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. Responsibilities Formulate, develop, and evaluate new algorithmic and system-level approaches for end-to-end AI serving, using analytical modeling and large-scale measurement to study token-level latency, tail latency (p95/p99), throughput-per-dollar, cold-start behavior, warm pool strategies, and capacity planning under multi-tenant SLOs and variable sequence lengths. Design and experimentally evaluate endpoint configuration and execution policies, including batching, routing, and scheduling strategies, tensor and pipeline parallelism, quantization and precision profiles, speculative decoding, and chunked or streaming generation, and drive the most promising approaches through robust rollout and validation into production. Perform hardware- and kernel-aware optimization by collaborating closely with model, kernel, compiler, and hardware teams to align serving algorithms with attention/KV innovations and accelerator capabilities.
Qualifications Required Qualifications:
Doctorate in relevant field AND 6+ years related research experience OR Master's Degree in relevant field AND 7+ years related research experience OR Bachelor's Degree in relevant field AND 9+ years related research experience OR equivalent experience.Other Requirements:
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:Microsoft Cloud Background Check:
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.Preferred Qualifications:
Doctorate in relevant field AND 8+ years related research experience OR Master's Degree in relevant field AND 12+ years related research experience OR Bachelor's Degree in relevant field AND 15+ years related research experience OR equivalent experience. Experience publishing academic papers as a lead author or essential contributor. Experience participating in a top conference in relevant research domain. Demonstrated experience in designing and optimizing efficient inference systems, combining foundations in algorithmic optimization, parallel computing, and request orchestration under strict SLO constraints with deep knowledge of attention and KV‑cache optimizations, batching and scheduling strategies, and cost‑aware deployment. 3+ years of experience with machine learning frameworks (e.g., PyTorch, TensorFlow) and inference serving frameworks (e.g., vLLM, Triton Inference Server, TensorRT-LLM, ONNX
Runtime, Ray Serve, DeepSpeed-MII). 3+ years of experience in GPU programming and optimization, with expert knowledge of CUDA, ROCm, Triton, PTX, CUTLASS, or similar GPU programming frameworks. Experience in C++ and Python for high-performance systems, with code quality and profiling/debugging skills. Research impact through publications and/or patents, coupled with hands‑on experience taking research ideas through execution and delivery in production. #research Research Sciences IC6- The typical base pay range for this role across the U.S. is USD $163,000
- $296,400 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $220,800
- $331,200 per year.
Similar jobs in Redmond, WA
Amazon.com, Inc.
Redmond, WA
Posted2 days ago
Updated18 hours ago
Amazon
Redmond, WA
Posted2 days ago
Updated18 hours ago
Similar jobs in Washington
Software Specialists Inc.
Tukwila, WA
Posted2 days ago
Updated18 hours ago
EPAM Systems
Seattle, WA
Posted2 days ago
Updated18 hours ago
Soufflet Malt
Vancouver, WA
Posted2 days ago
Updated18 hours ago
Stripe
Seattle, WA
Posted2 days ago
Updated18 hours ago