Tallo logoTallo logo

GPU Performance Engineer | Experienced Hire

Job

Susquehanna International Group, LLP

Bala Cynwyd, PA (In Person)

Full-Time

Posted 7 weeks ago (Updated 7 weeks ago) • Actively hiring

Expires 5/27/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
71
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Overview We are looking for a GPU Performance Engineer to build highly optimized CUDA kernels for low-latency inference. This role is focused on workloads where off-the-shelf runtimes and vendor libraries do not fully exploit the structure of the model, and where custom kernels, memory layouts, and execution strategies can deliver meaningful gains. You will work closely with quantitative researchers and engineers to understand model structure, identify computational bottlenecks, and turn mathematical ideas into production-grade GPU implementations. You will use your understanding of GPU hardware to help shape models that are both mathematically effective and efficient to run. The problems span compact neural networks, tree-based models, and other structured inference workloads where latency, throughput, and efficiency all matter. This role is a strong fit for someone who enjoys low-level optimization, performance analysis, and translating abstract models into hardware-efficient code. In this role, you will: Design, implement, and optimize custom CUDA kernels for latency-critical inference workloads Develop fine-grained GPU implementations tailored to specific model structures Analyze quantitative research models and computational bottlenecks to identify opportunities for parallelization and hardware-efficient execution Collaborate directly with quantitative researchers to translate mathematical models into high-performance compute pipelines Optimize end-to-end inference performance through kernel tuning, memory-layout design, execution strategy, I/O optimization, and precision tradeoffs Profile and benchmark GPU performance Improve latency and throughput in production inference systems Contribute to GPU architecture decisions and performance best practices What we're looking for Strong proficiency in writing and optimizing CUDA kernels Solid programming experience in C/C++ (preferred) Deep understanding of GPU architecture, including memory hierarchy, SIMT execution, occupancy, and latency/throughput tradeoffs Ability to reason about numerical stability, precision, performance tradeoffs, and how model design choices affect hardware efficiency Strong problem-solving skills and comfort working with low-level systems Preferred qualifications: PhD in Mathematics, Physics, Computer Science, Engineering, or related quantitative field Strong background in linear algebra, probability, numerical methods, or scientific computing Experience working with quantitative research teams or financial models Demonstrated ability to improve real-world inference performance beyond baseline framework or library implementations Familiarity with PTX-level behavior, tensor core utilization, or architecture-specific tuning Exposure to ONNX Runtime, TensorRT, Triton, TVM, or similar systems Exposure to: Neural networks Tree-based models (e.g., LightGBM) State space models (e.g., Mamba architectures) Experience with kernel fusion, custom operators, model compilation, or graph-level optimization About Susquehanna Susquehanna is a global quantitative trading firm powered by scientific rigor, curiosity, and innovation. Our culture is intellectually driven and highly collaborative, bringing together researchers, engineers, and traders to design and deploy impactful strategies in our systematic trading environment. To meet the unique challenges of global markets, Susquehanna applies machine learning and advanced quantitative research to vast datasets in order to uncover actionable insights and build effective strategies. By uniting deep market expertise with cutting-edge technology, we excel in solving complex problems and pushing boundaries together. If you're a recruiting agency and want to partner with us, please reach out to recruiting@sig.com. Any resume or referral submitted in the absence of a signed agreement will not be eligible for an agency fee. #LI-Onsite

Similar remote jobs

Similar jobs in Bala Cynwyd, PA

Similar jobs in Pennsylvania