Senior Deep Learning Frameworks CUDA Software Engineer
2100 NVIDIA USA
Santa Clara, CA (In Person)
$235,750 Salary, Full-Time
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
What you will be doing:
Integrate new CUDA features and Runtime abstractions in AI frameworks: from PoC to performance analysis to production Perform deep analysis of AI workloads and frameworks to identify requirements and opportunities to innovate in the lower layers of the stack. Collaborate hands-on with teams working on the latest AI models. Own and drive improvements in the AI Compiler-Runtime interface to build speed-of-light multi-GPU multi-node solutions. Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads. Influence the roadmap of core CUDA to facilitate building next-gen DL frameworks. Collaborate with a very dynamic team across multiple time zones. Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and frameworks that enhance performance and programmability. Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning. Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products.What we need to see:
BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience). 8+ years of relevant industry experience or equivalent academic experience after completed degree. Development experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang Rapid prototyping and development with Python, C++, CUDA or related DSLs Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile) Experience conducting performance benchmarking on AI clusters. Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems) Understanding of HPC/AI communication concepts Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals) Adaptability and passion to learn new frameworks and tools Flexibility to work and communicate effectively across different teams and timezones Ways to stand out from the crowd: Deep expertise in the performance internals and execution graphs of major deep learning autograd, training and inference frameworks (e.g., PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, MaxText, etc.). Hands-on experience with CUDA, specific communication libraries (e.g., NCCL, MPI, UCX) and distributed machine learning techniques (e.g., pipeline parallelism, tensor parallelism). Expertise in one or more of these areas: Training, Distributed inference, MoE, Reinforcement Learning, kernel authoring (on CUDA, Triton, cuTe, etc). Background in deep learning compilers, both graph-level and codegen (e.g., Triton, XLA, torch compile) Experience with programming for compute & communication overlap in distributed runtime Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until May 18, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.Similar remote jobs
The Advocates for Human Rights
Minneapolis, MN
Posted12 hours ago
Updated33 minutes ago
TCA Counseling Group
Boston, MA
Posted1 day ago
Updated33 minutes ago
Similar jobs in Santa Clara, CA
Primary Talent Partners
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
BLOOM NAILS
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
Onsite Ergonomics
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
Spectraforce Technologies Inc
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
BILL WILSON CENTER
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
Similar jobs in California
LA Catholics
Los Angeles, CA
Posted12 hours ago
Updated33 minutes ago
Regal Entertainment Group
San Diego, CA
Posted1 day ago
Updated33 minutes ago
I-State Truck Center
Sacramento, CA
Posted1 day ago
Updated33 minutes ago