Senior Deep Learning Frameworks CUDA Software Engineer

Job

2100 NVIDIA USA

Santa Clara, CA (In Person)

$235,750 Salary, Full-Time

Posted 4 days ago (Updated 2 days ago) • Actively hiring

Expires 6/16/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

NVIDIA is leading the way in groundbreaking developments in Artificial Intelligence, High Performance Computing and Visualization. The GPU, our invention, serves as the visual cortex of modern computers and is at the heart of our products and services. Our work opens up new universes to explore, enables amazing creativity and discovery, and powers what were once science fiction inventions from artificial intelligence to autonomous cars. We are looking for a motivated Deep Learning engineer to bring advanced CUDA features and Distributed Runtime technologies into AI stacks, including PyTorch, TRT-LLM, vLLM, SGLang, JAX, etc. You will be working with the team that created core CUDA features and runtimes for scaling Deep Learning and HPC applications. Your customers will have diverse multi-GPU demands, ranging from training on scales up to 100K GPUs to inference down at microsecond latency. CUDA features improve both productivity and performance of AI applications. Your work in AI toolkits will accelerate enabling those for the community. This is an outstanding opportunity for someone with an AI background to advance the state of the art in this space. Are you ready to contribute to the development of innovative technologies and help realize NVIDIA's vision?

What you will be doing:

Integrate new CUDA features and Runtime abstractions in AI frameworks: from PoC to performance analysis to production Perform deep analysis of AI workloads and frameworks to identify requirements and opportunities to innovate in the lower layers of the stack. Collaborate hands-on with teams working on the latest AI models. Own and drive improvements in the AI Compiler-Runtime interface to build speed-of-light multi-GPU multi-node solutions. Design fault-tolerant and elastic solutions for large-scale or dynamic AI workloads. Influence the roadmap of core CUDA to facilitate building next-gen DL frameworks. Collaborate with a very dynamic team across multiple time zones. Collaborate closely with AI researchers, HW and SW architects, kernel and compiler authors and CUDA driver experts to co-design systems and frameworks that enhance performance and programmability. Develop exploratory tools and runtime systems to profile and accelerate new paradigms in deep learning. Write clean, effective, and maintainable code, ensuring exploratory prototypes can smoothly transition into open-source releases, upstream framework integrations, internal tools, or closed-source commercial products.

What we need to see:

BS, MS, or PhD degree in Computer Science, Computer Engineering, Electrical Engineering, or related field (or equivalent experience). 8+ years of relevant industry experience or equivalent academic experience after completed degree. Development experience with Deep Learning Frameworks such PyTorch, JAX, and Inference Engines such as TRT-LLM, vLLM, SGLang Rapid prototyping and development with Python, C++, CUDA or related DSLs Solid grasp of AI models, parallelisms, and/or compiler technologies (e.g. torch.compile) Experience conducting performance benchmarking on AI clusters. Familiarity with at least one performance profiler toolchain (PyTorch profiler, NVIDIA Nsight Systems) Understanding of HPC/AI communication concepts Good understanding of computer system architecture, HW-SW interactions and operating systems principles (aka systems software fundamentals) Adaptability and passion to learn new frameworks and tools Flexibility to work and communicate effectively across different teams and timezones Ways to stand out from the crowd: Deep expertise in the performance internals and execution graphs of major deep learning autograd, training and inference frameworks (e.g., PyTorch, JAX, TensorRT, vLLM, sgLang, Nemo, Megatron, MaxText, etc.). Hands-on experience with CUDA, specific communication libraries (e.g., NCCL, MPI, UCX) and distributed machine learning techniques (e.g., pipeline parallelism, tensor parallelism). Expertise in one or more of these areas: Training, Distributed inference, MoE, Reinforcement Learning, kernel authoring (on CUDA, Triton, cuTe, etc). Background in deep learning compilers, both graph-level and codegen (e.g., Triton, XLA, torch compile) Experience with programming for compute & communication overlap in distributed runtime Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 4, and 224,000 USD - 356,500 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until May 18, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law. NVIDIA pioneered accelerated computing. Today, our AI infrastructure powers global intelligence, transforming every industry. Learn more about NVIDIA.

Similar remote jobs

Job
Development Assistant
TA
The Advocates for Human Rights
Minneapolis, MN
Posted12 hours ago
Updated33 minutes ago
Job
Senior Product Designer
BL
Best Life
Raleigh, NC
Posted1 day ago
Updated33 minutes ago
Job
Private Practice Therapist / Counselor
TC
TCA Counseling Group
Boston, MA
Posted1 day ago
Updated33 minutes ago
Job
Senior Business Analyst
C
Cognizant
Lansing, MI
Posted1 day ago
Updated33 minutes ago
Job
Grievances & Appeals Expedited Case Rep
H
Humana
Lansing, MI
Posted1 day ago
Updated33 minutes ago

Similar jobs in Santa Clara, CA

Job
Administrative Assistant
PT
Primary Talent Partners
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
Job
Team Lead Bloom Nails (Westfield Valley Fair)
BN
BLOOM NAILS
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
Job
Physical Therapist Assistant (PTA) Injury Prevention Consultant Non-Clinical | On-Site | Full-Time
OE
Onsite Ergonomics
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
Job
Rack Level Integration Engineer
ST
Spectraforce Technologies Inc
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago
Job
Relief Residential Counselor - SNS & THPP/NMD
BW
BILL WILSON CENTER
Santa Clara, CA
Posted1 day ago
Updated33 minutes ago

Similar jobs in California

Job
Transitional Kindergarten Teacher
LC
LA Catholics
Los Angeles, CA
Posted12 hours ago
Updated33 minutes ago
Job
Regal Mira Mesa PT - Team Lead - Starting $21.50/hr
RE
Regal Entertainment Group
San Diego, CA
Posted1 day ago
Updated33 minutes ago
Job
Smog Technician
M
Midas
Orange, CA
Posted1 day ago
Updated33 minutes ago
Job
Field Service Mechanic - Heavy Duty Work Trucks
IT
I-State Truck Center
Sacramento, CA
Posted1 day ago
Updated33 minutes ago
Job
AI Development Architect
S
SAP
Palo Alto, CA
Posted1 day ago
Updated33 minutes ago