Tallo logoTallo logo

Kernel Engineer — Scientific Computing (SPU)

Job

Vorticity Inc.

Redwood City, CA (In Person)

$145,000 Salary, Full-Time

Posted 3 days ago (Updated 17 hours ago) • Actively hiring

Expires 6/13/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
82
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Kernel Engineer — Scientific Computing (SPU) Vorticity Inc. Redwood City, CA Job Details Full-time $120,000 - $170,000 a year 6 hours ago Qualifications GPU programming Performance tuning Hypothesis testing Writing skills Algorithms Analysis skills C++ Prototype creation Software development Simulation systems Senior level Prototypes Communication skills Python Full Job Description Vorticity is building the world's first Scientific Processing Unit (SPU), a new class of silicon purpose-built to accelerate scientific computing beyond the limits of GPUs. We are designing tightly coupled software-hardware systems around applied mathematics workloads to deliver order-of-magnitude performance gains. Unlocking its full potential requires early, deep engagement from applied mathematics-driven software engineers who can translate real-world scientific workloads into executable models, kernels, libraries, and applications that inform both architecture and tooling decisions. As a Kernel Engineer, you will work at the intersection of applied mathematics, scientific computing, parallel programming, and low-level performance engineering. You will help shape how numerical kernels are implemented, optimized, and eventually mapped onto the SPU. Your work may include building early numerical kernels and libraries, developing prototype applications, and writing Python-based workload models and simulators, all to support and inform the evolving hardware and compiler stack. This requires both strong applied math fundamentals and deep low-level implementation ability. You should be comfortable moving from mathematical formulations to efficient kernels, reasoning about accuracy, stability, data movement, memory hierarchy, parallel execution, and compiler behavior along the way. This position is ideal for someone who combines strong scientific computing instincts with the low-level habits of a performance engineer. Responsibilities Prototyping and implementing core kernels and low-level numerical primitives for the SPU. Translating mathematical formulations into executable, performance-relevant kernel implementations. Analyzing and optimizing memory-access patterns, including coalescing, locality, shared memory usage, cache behavior, register pressure, and host-device data movement. Collaborating closely with hardware architects to evaluate algorithm-architecture tradeoffs around memory hierarchy, synchronization, vector/SIMT execution, instruction behavior, and parallel scheduling. Working with compiler and runtime teams to ensure kernels map cleanly to the SPU programming model. Designing microbenchmarks, correctness tests, numerical accuracy tests, and performance models, then iteratively refining kernels based on hardware evolution, compiler behavior, profiler output, and measured performance.
Core Skills:
Strong applied mathematics and scientific computing judgment, with the ability to understand numerical workloads deeply enough to implement them correctly and efficiently. Strong proficiency in C++ and CUDA, HIP, SYCL, or an equivalent accelerator programming model. Experience writing custom kernels, not just using existing frameworks or vendor libraries. Ability to translate mathematical formulations into low-level implementations while balancing accuracy, stability, precision, data movement, and performance. Deep understanding of GPU execution and memory hierarchy, including global memory, shared memory, registers, caches, coalescing, atomics, reductions, scans, warp-level execution, and occupancy. Experience using profiling and performance tools to identify bottlenecks, test hypotheses, and validate improvements. Ability to reason from profiler output to concrete code changes, rather than treating performance debugging as guesswork. Solid concurrency fundamentals, including race conditions, atomicity, synchronization, and thread/process execution behavior. Nice to
Have Skills:
Familiarity with performance analysis tools or modeling techniques (profilers, roofline models) Exposure to compilers, runtimes, or code generation frameworks Experience in applied scientific domains such as physics, geophysics, CFD, climate, materials, fusion, or finance. Experience with low-level GPU assembly or intermediate representations. Familiarity with low-level system software or drivers.
Non-Technical Qualities:
Excellent written and verbal communication skills Strong ability to work independently and collaboratively in a team. Comfort operating in an early-stage environment where the hardware, compiler, and software stack are evolving together. Willingness to put in the hard work needed to bring the SPU to life.
Above all:
low ego. As passionate scientists and engineers, we are well aware of the plethora of critical problems in the world that cannot be solved because humanity simply does not have enough computing power. To address this, Vorticity is developing a radically new silicon chip architecture and system to dramatically accelerate scientific computing problems. Vorticity's mission is to expand human ingenuity. To do that we are building a team of exceptional people to work together on big problems. Join us!
Compensation Range:
$120K - $170K

Similar remote jobs

Similar jobs in Redwood City, CA

Similar jobs in California