Skip to main content
Tallo logoTallo logo

Sr. Manager, ML Acceleration and Performance

Job

Rivian

Palo Alto, CA (In Person)

Full-Time

Posted 2 weeks ago (Updated 1 week ago) • Actively hiring

Expires 6/23/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
80
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Sr. Manager, ML Acceleration and Performance Rivian - 2.7 Palo Alto, CA Job Details Full-time 8 hours ago Qualifications AI models GPU programming PyTorch System performance optimization Quantization Research AI platforms (beyond public GPTs) Scalable systems Computational framework Production systems C++ Managing engineering teams Model deployment Assembly language Computer hardware Developing large-scale AI models Scalability Model training Simulation tools Machine learning libraries Machine learning frameworks Providing code feedback Stakeholder management Full Job Description In this role, you will lead and manage a high-caliber team within Rivian's Perception organization. You are responsible for the end-to-end strategy of how our most advanced neural networks are compressed, optimized, and deployed onto Rivian's custom embedded compute platforms. You will bridge the gap between high-level ML research and low-level silicon constraints, ensuring that Rivian's autonomy stack remains "performance-first" while scaling to meet next-generation safety requirements.
Team Leadership & Mentorship:
Build, lead, and develop a world-class team of acceleration engineers. Manage performance, set technical goals, and foster a culture of high-performance systems engineering.
Acceleration Roadmap:
Define the 2-3 year strategy for model compression (Pruning, Quantization, NAS) and runtime optimization. Determine when to build custom in-house kernels versus leveraging vendor libraries (Tensor
RT, SNPE
).
Hardware-Software Co-Design:
Act as the primary stakeholder for the Perception team when collaborating with Hardware Architecture. Influence the design of next-gen Rivian silicon by characterizing current model bottlenecks and predicting future compute requirements.
Cross-Functional Delivery:
Partner with Perception, Planning, and Embedded Systems leads to ensure that "Research" models can actually run in real-time on-vehicle without compromising safety or thermal envelopes.
Infrastructure Strategy:
Oversee the development of automated profiling and CI/CD benchmarking pipelines that track latency, memory, and energy consumption across the entire fleet.
Education:
MS or Ph.D. in CS, EE, or related field with 10+ years of industrial experience, including 2+ years in a technical leadership or management capacity.
System Mastery:
Expert-level knowledge of the ML stack: from High-level Frameworks (PyTorch) to IR/Compilers (MLIR, TVM, XLA) to Silicon (GPU/NPU/DSP).
Optimization Specialist:
Proven track record of deploying large-scale models into production via Quantization-Aware Training (QAT), FP8/INT4 precision, and Neural Architecture Search (NAS).
Architectural Fluency:
Ability to read hardware spec sheets (data sheets, ISA) and translate "Peak TFLOPS" into realistic "Application Throughput."
Low-Level Coding:
Proficient in C++, CUDA, and assembly-level optimization, with the ability to perform deep-dive code reviews on custom kernels. 0th and 1st
Order Thinking:
Deep understanding of the "Elephant in the room": optimizing non-differentiable planning objectives and managing the trade-offs between open-loop and closed-loop simulation efficiency.
Profiling Expert:
Mastery of system-wide profiling tools (NVIDIA Nsight, PyTorch Profiler, VTune) to identify bottlenecks across the
CPU-GPU-NPU
interconnects.