Tallo logoTallo logo

Staff Software Engineer - Platform & Reliability

Job

The Search Solutions, LLC

Full-Time

Posted 3 weeks ago (Updated 1 week ago) • Actively hiring

Expires 6/2/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
84
out of 100
Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

The S r. S t aff Sof t ware Engineer
  • Pla t form & Reliabili t y will be p art of t he new Produc t Engineering t eam t asked wi t h designing and building t he nex t genera t ion of Agen t ic A I
  • powe r ed p r od u c t s fo r . Ac tin g a s th e Tec hni c al Lead an d Primary A r c hit ec t, yo u w ill be a han ds-o n l e a de r r espo n s i b l e fo r th e t e am s ove rall de li ve r y of th e runtim e e n v ir o nm e nt an d aut o mati o n fo r A I se r v i ces an d Age nt s . Yo u w ill lead a s mall sq ua d by deco m pos in g co m p l ex p lat fo rm r eq uir e m e nt s-s u c h a s A I
  • spec i f i c C I / C D, a ge nt obse r v a b ilit y , an d aut o mat ed sc alin g
  • int o a c ti o na b l e ta sks w hil e r e mainin g deep l y e m bedded in th e codeb a se K ey R esp o ns i b iliti es
  • T echn i ca l Lead & E xecu tio n: Lead t he t echnical delivery of t he Agen t ic Pla t form by t ransla t ing high-level infras t ruc t ure roadmaps in t o ac t ionable developmen t t asks . Yo u w ill ow n ta sks b r e a kdow n fo r yo ur sq ua d , e n s urin g hi g h
  • q ualit y o ut p ut thr o u g h t ec hni c al m e nt o r s hi p an d ri go r o u s ar c hit ec tural ove r s i g ht.
  • A u t o m a t ed A gen t D elive r y
  • C I /C D
    Archi t ec t and implemen t high-veloci t y C I/ CD pipelines specifically designed for t he lifecycle of A I Agen t s and services , including au t oma t ed model evalua t ion and blue-green deploymen t s for agen t ic workflows on Google Cloud Platform .
  • Cl oud I nf r as tr uc t u r e E ng i nee ri ng: Lead t he design and implemen t a t ion of our cloud-na t ive infras t ruc t ure on Google Cloud Platform using Terraform and Kuberne t es ( GKE ) . You will own t he core run t ime environmen t where au t onomous agen t s and t ransac t ional microservices coexis t.
  • A gen ti c O bse r vab ilit y &
S RE :
Apply SRE principles t o build a specialized moni t oring and aler t ing s t ack for A I agen t s . You will implemen t t racing for agen t "reasoning loops" and ensure t he reliabili t y of t he underlying Vec t or and Graph da t a s t ores .
  • AI•N a ti ve S DLC L eade r sh i p: Ac t ively u t ilize coding agen t s t o plan , genera t e , and refac t or pla t form code and I nfras t ruc t ure as Code I aC , main t aining high veloci t y while ensuring code quali t y .
  • Sca le & Per f orm a n c e : Mo nit o r an d op timi ze th e pe r fo rman ce an d cos t
  • effec ti ve n ess of A I wo r k l o a ds , e n s urin g o ur p lat fo rm can han d l e hi g h
  • f r eq u e n cy a ge nt c all s an d multi
  • m od al d ata p r ocess in g .
  • Secu rit y & Go ve r nance: Own t he implemen t a t ion of secure run t ime boundaries , ensuring t ha t bo t h human users and A I agen t s opera t e wi t hin s t ric t, audi t ed permission se t s
Experience:
10+ years of Software or Platform Engineering experience, with a background as a hands-on engineer who has successfully led technical squads.
Technical Stack:
Expert mastery of Google Cloud Platform (GKE, Vertex AI), Terraform, Kubernetes, and Python.
Product AI Platform:
Proven track record of designing and shipping production platforms for AI/LLM workloads, including specialized CI/CD and observability for agentic architectures.
Reliability Mindset:
Strong command of SRE principles, including experience with SLOs, error budgets, and troubleshooting complex distributed systems.
Cloud Infrastructure:
Experienced in working with cloud platforms (Google Cloud Platform, AWS) and deploying containerized services that are secure and scalable.
Coding Agents:
Demonstrated proficiency in using coding agents to accelerate the SDLC and plan and code complex engineering tasks.

Similar remote jobs