Inference Engineer
Job
Acceler8 Talent
Los Altos, CA (In Person)
Full-Time
Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
100
out of 100
Average of individual scores
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Inference Engineer at Acceler8 Talent Inference Engineer at Acceler8 Talent in Los Altos, California Posted in 10 minutes ago.
Type:
full-timeJob Description:
Inference Engineer We're partnered with an AI infrastructure company building next-generation systems for large-scale AI workloads. Their platform is rethinking how inference runs at scale - intelligently orchestrating workloads across heterogeneous hardware to unlock major gains in performance, efficiency, and cost. The team is solving some of the hardest problems in modern AI infrastructure: inference scheduling, KV cache management, runtime optimization, memory efficiency, and low-latency serving across distributed systems. They're looking for engineers who care deeply about how models execute in production - not just training models, but making them fast, scalable, and reliable under real-world load. What You'll Work On Designing and optimizing large-scale inference pipelines Improving latency, throughput, and concurrency under production workloads Building inference runtimes and serving infrastructure Optimizing batching, scheduling, and request orchestration Managing KV cache allocation, reuse, placement, and eviction strategies Improving prefill/decode performance and memory efficiency Profiling bottlenecks across model, runtime, and distributed system layers Collaborating closely with compiler, kernel, and systems engineers What They're Looking For Strong systems engineering fundamentals Experience building or scaling ML inference / model serving systems Deep understanding of performance optimization and memory behavior Experience with runtimes such as vLLM, TensorRT-LLM, or custom serving infrastructure Strong understanding of transformer architectures and attention mechanisms Familiarity with batching, scheduling, concurrency, and cache management Strong Python and/or C++ engineering skills Why Join Work on cutting-edge inference infrastructure and AI systems problems Build systems designed for next-generation AI scale Small, highly technical engineering team Significant ownership and technical impact Opportunity to shape foundational infrastructure for future AI workloadsSimilar jobs in Los Altos, CA
Sonder Consultants
Los Altos, CA
Posted2 days ago
Updated6 hours ago
Transcend Solutions
Los Altos, CA
Posted3 days ago
Updated6 hours ago
Similar jobs in California
Google
Sunnyvale, CA
Posted1 day ago
Updated6 hours ago
Nanny Village Agency LLC
Belmont, CA
Posted1 day ago
Updated6 hours ago
Danaher Corporation
Mountain View, CA
Posted1 day ago
Updated6 hours ago
Whole Foods Market
Palm Desert, CA
Posted1 day ago
Updated6 hours ago