Job Description
Overview We are seeking a Senior AI Engineer to define and drive the end-to-end engineering of an enterprise-grade agentic orchestration capability that enables smart AI agents to autonomously execute workflows, collaborate with humans, and operate securely with governed access. This role owns the technical direction and delivery of core capabilities spanning agent workflow development environments, automated CI/CD and safe migration patterns, human-agent collaboration and long-running orchestration, and agent identity/registry/marketplace with policy enforcement. You will serve as the technical authority—establishing standards for reliability, auditability, security, and performance; driving cross-team execution; and ensuring adoption at scale through enablement and strong operational practices. Responsibilities 1) Technical Direction, Architecture Standards & Roadmap Ownership (30%) Define reference architecture, design standards, and engineering guardrails for agent workflow orchestration, human collaboration, and identity/governance capabilities. (Decide/Consult) Own sequencing of releases, deprecation strategy, and compatibility standards to enable safe evolution with minimal disruption. (Decide) 2) Secure-by-Design Identity, Policy Enforcement & Auditability (25%) Establish and enforce non-human identity patterns, consent propagation mechanisms, RBAC/ABAC policy models, and least-privilege access across agent workflows. (Decide/Consult) Ensure end-to-end auditability for agent actions, prompt/tool changes, model switches, handoffs/messages, approvals, and access decisions; define evidence requirements for compliance. (Decide/Consult) Define and enforce data classification, PII redaction, retention/purge, and policy-based routing to compliant models/providers. (Decide/Consult) 3) Deterministic Human-Agent Collaboration & Long-Running Orchestration (20%) Define and drive implementation of deterministic handoff patterns (assign/escalate/co-pilot/co-author), resilient messaging, and stateful long-running workflows with timers and compensation/rollback. (Decide/Consult) Ensure seamless integration into enterprise systems (CRM/ITSM/custom apps) via gateways and standardized interfaces. (Consult/Decide) 4) Automated Delivery, CI/CD Gates & Safe Migration Patterns (15%) Define promotion gates and automated CI/CD standards including versioning, testing, security scans, approvals, and drift detection. (Decide/Consult) Drive safe migration practices between model providers/versions with minimal downtime and proven rollback; define operational playbooks. (Decide/Consult) 5) Operational Excellence, Reliability & Enablement (10%) Own SLIs/SLOs and operational posture: observability standards (metrics/logs/traces), incident and credential compromise runbooks, and release readiness reviews. (Decide/Consult) Deliver enablement: reference implementations, developer playbooks, training for platform ops and application teams; mentor senior and junior engineers. (Consult/Execute)
Decision-Making Autonomy:
High — accountable for architecture standards, cross-team technical tradeoffs, governance posture, and operational readiness decisions. Supervision Required:
Low — operates with periodic alignment to senior leadership and governance forums. Complexity of Role:
Very high — enterprise-grade orchestration with strict security/audit requirements, multi-tenant isolation, deterministic workflow needs, and latency SLOs across multiple integrated systems. Cross-Functional Interactions:
Yes — leadership-level engagement across security/identity, DevX, SRE, enterprise applications, and business/product stakeholders. Qualifications Key Skills/Experience Required Minimum Qualifications:
Minimum Qualifications Bachelor's in CS/AI/ML/Data Science or equivalent experience required. Master's preferred 10 year experience in ML, Data Science, AI required. Extensive experience designing and operating enterprise platforms/services with production reliability and governance requirements. Required Expertise Systems/platform architecture:
multi-tenant isolation, scalability, versioning, backward compatibility, release sequencing Orchestration and workflow systems: Temporal-class systems (or equivalent) including long-running workflows, compensation, state persistence Identity and security architecture: SSO (SAML/OIDC), non-human identity, RBAC/ABAC, consent propagation, secrets/keys rotation, least-privilege design Governance and compliance engineering: audit logging models, approval workflows, policy routing, PII redaction, retention/purge controls Observability/SRE partnership: SLO definition, OTel-based telemetry, incident management, reliability engineering Developer enablement: SDK design, reference implementations, platform adoption strategy, mentoring and technical leadership Differentiating Competencies Strategic thinking: shapes direction and standards; anticipates second-order impacts of platform decisions Proactiveness & initiative: identifies systemic risks early (security, reliability, adoption) and drives resolution Discretion:
handles sensitive security/identity, compliance, and access-control topics appropriately Financial acumen: frames tradeoffs across build vs buy, provider choices, operational cost and risk Executive communication: crisp narratives for governance forums; evidence-based recommendations and decisions Organizational leadership: aligns multiple teams, mentors senior engineers, drives adoption and accountability Qualifications:
Key Skills/Experience Required Minimum Qualifications:
Minimum Qualifications Bachelor s in CS/AI/ML/Data Science or equivalent experience required. Master s preferred 10 year experience in ML, Data Science, AI required. Extensive experience designing and operating enterprise platforms/services with production reliability and governance requirements. Required Expertise Systems/platform architecture:
multi-tenant isolation, scalability, versioning, backward compatibility, release sequencing Orchestration and workflow systems: Temporal-class systems (or equivalent) including long-running workflows, compensation, state persistence Identity and security architecture: SSO (SAML/OIDC), non-human identity, RBAC/ABAC, consent propagation, secrets/keys rotation, least-privilege design Governance and compliance engineering: audit logging models, approval workflows, policy routing, PII redaction, retention/purge controls Observability/SRE partnership: SLO definition, OTel-based telemetry, incident management, reliability engineering Developer enablement: SDK design, reference implementations, platform adoption strategy, mentoring and technical leadership Differentiating Competencies Strategic thinking: shapes direction and standards; anticipates second-order impacts of platform decisions Proactiveness & initiative: identifies systemic risks early (security, reliability, adoption) and drives resolution Discretion:
handles sensitive security/identity, compliance, and access-control topics appropriately Financial acumen: frames tradeoffs across build vs buy, provider choices, operational cost and risk Executive communication: crisp narratives for governance forums; evidence-based recommendations and decisions Organizational leadership: aligns multiple teams, mentors senior engineers, drives adoption and accountability