Tallo logoTallo logo

NPU Operator Engineer

Job

Black Sesame Technologies Inc

Campbell, CA (In Person)

Full-Time

Posted 5 days ago (Updated 4 hours ago) • Actively hiring

Expires 6/11/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
75
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

NPU Operator Engineer at Black Sesame Technologies Inc NPU Operator Engineer at Black Sesame Technologies Inc in Campbell, California Posted in 1 day ago.
Type:
full-time
Job Description:
We are looking for a Junior NPU Kernel/Operator Engineer to develop and optimize deep learning operators for a custom AI accelerator / NPU. The role focuses on kernel/operator implementation, performance tuning, and correctness validation across a broad range of neural network workloads. This is a good fit for candidates with strong C/C++ and Python skills who are interested in hardware-aware software optimization. Prior NPU experience is helpful but not required. Responsibilities Implement and optimize NPU operators such as normalization, reduction, transpose, reshape, gather/scatter, quant/dequant, and fused elementwise kernels. Tune kernels for memory bandwidth, SRAM usage, data reuse, DMA latency, bank conflicts, and compute utilization. Validate operator correctness against PyTorch, NumPy, or framework reference results. Benchmark performance on simulator or silicon. Debug correctness, precision, memory layout, and performance issues. Work with compiler, runtime, hardware, and model teams. Document operator behavior, tensor layout, tiling strategy, and performance results. Requirements BS/MS in CS, EE, Computer Engineering, or related field. Strong C/C++ and Python programming skills. Basic understanding of tensor computation and neural network operators. Familiarity with basic computer architecture concepts such as memory hierarchy, bandwidth, latency, cache/SRAM, and parallelism. Good debugging and problem-solving skills. Preferred Experience with any of the following: CUDA, Triton, OpenCL, TVM, MLIR, Halide SIMD, DSP, embedded C/C++, GPU, NPU, FPGA, or HPC programming compiler/runtime development Understanding of tiling, vectorization, memory access optimization, or mixed precision. Experience with FP32, FP16, BF16, INT8, or other numerical formats.

Similar remote jobs

Similar jobs in Campbell, CA

Similar jobs in California