Member of Technical Staff, ML Infrastructure & Inference

Job

Acceler8 Talent

San Lorenzo, CA (In Person)

Full-Time

Posted 2 days ago (Updated 2 hours ago) • Actively hiring

Expires 6/22/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

100

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Member of Technical Staff, ML Infrastructure & Inference at Acceler8 Talent Member of Technical Staff, ML Infrastructure & Inference at Acceler8 Talent in San Lorenzo, California Posted in about 2 hours ago.

Type:

full-time

Job Description:

Member of Technical Staff, ML Infrastructure & Inference Overview We are a cutting-edge AI infrastructure company is building a scalable cloud platform designed for next-generation machine learning workloads ($80M series A). As AI systems continue to grow in complexity, traditional infrastructure models are facing limitations in efficiency, scalability, and cost. The platform addresses these challenges through a hardware-agnostic architecture that dynamically maps workloads across diverse accelerator environments, enabling higher utilization and better performance across multi-vendor systems. The company is also developing production-grade infrastructure for agentic AI applications, allowing customers to deploy and manage workloads through simple APIs without handling low-level optimization or hardware orchestration. The Role The team is seeking a Member of Technical Staff focused on ML systems and inference infrastructure. In this role, you will build and optimize large-scale inference systems that serve modern AI models efficiently in production environments. You'll work across runtime behavior, scheduling, memory management, and system optimization to improve latency, throughput, and scalability. This opportunity is well suited to engineers who understand how modern models execute at scale and enjoy solving deep performance challenges across the inference stack. Responsibilities Design and optimize end-to-end inference pipelines from request intake through response generation Build scalable inference runtimes optimized for latency, throughput, and concurrency Improve batching, scheduling, and queueing strategies under real-world production workloads Develop efficient KV cache allocation, reuse, and eviction strategies Optimize prefill and decode execution paths, including attention and memory performance Debug and profile bottlenecks across models, runtimes, and distributed systems Partner with compiler, kernel, networking, and infrastructure teams to improve system-wide performance Required Qualifications Strong software engineering and systems fundamentals Experience building or operating ML inference or model serving systems Understanding of runtime performance, memory usage, and system behavior under load Preferred Qualifications Experience with inference frameworks such as TensorRT-LLM, vLLM, or custom serving infrastructure Strong understanding of transformer architectures and attention mechanisms Experience with batching, scheduling, and concurrency optimization in inference systems Familiarity with KV cache management and memory placement strategies Experience tuning latency- and throughput-sensitive systems Strong programming skills in Python and C++ Based onsite in

SF Keywords:

ML Systems, Inference Infrastructure, LLM Inference, Model Serving, Distributed Systems, GPU Infrastructure, AI Infrastructure, Inference Runtime, TensorRT-LLM, vLLM, Transformer Architecture, Attention Mechanisms, KV Cache, Memory Optimization, Latency Optimization, Throughput Optimization, Concurrency Control, Batching, Scheduling Systems, Runtime Optimization, Performance Profiling, Scalable Inference, Distributed Inference, CUDA, PyTorch

Similar jobs in San Lorenzo, CA

Job
Marine Interdiction Agent
UC
U.S. Customs and Border Protection
San Lorenzo, CA
Posted1 day ago
Updated2 hours ago
Job
Change Management Consultant
DI
DRISHTICON Inc
San Lorenzo, CA
Posted2 days ago
Updated2 hours ago
Job
Product Manager - HPC Software (Infrastructure & Storage)
S
Salt
San Lorenzo, CA
Posted2 days ago
Updated2 hours ago
Job
Senior Firmware Engineer
S
StratITech
San Lorenzo, CA
Posted2 days ago
Updated2 hours ago
Job
Sales Development Representative
HH
Honey Homes
San Lorenzo, CA
Posted2 days ago
Updated2 hours ago

Similar jobs in California

Job
Physical Therapist (PT) - URGENT NEED Home Health - Per Diem- High Rate
PS
PLOV Solutions
Thousand Oaks, CA
Posted1 day ago
Updated2 hours ago
Job
Guest Room Attendant
CH
Crescent Hotels & Resorts LLC
San Jose, CA
Posted1 day ago
Updated2 hours ago
Job
CNC Set Up/Operators - 2nd Shift
AG
AtWork Group
Baldwin Park, CA
Posted1 day ago
Updated2 hours ago
Job
Security Patrol Driver Shopping Mall
AU
Allied Universal
Lake Forest, CA
Posted1 day ago
Updated2 hours ago
Job
Memory Care Assistant
FP
Front Porch
San Diego, CA
Posted1 day ago
Updated2 hours ago