Open Shift Operations & Platform Management
Operational Leadership & Governance:
Own the operational health, lifecycle, and governance of Kubernetes/OpenShift platforms. Drive standards, procedures, and operational readiness across all environments.
Incident Management Leadership:
Lead major incidents, coordinate cross team response, ensure timely restoration, manage communication, and drive root-cause and permanent corrective action.
Reliability & Stability Improvements:
Lead initiatives to enhance resilience, reduce platform incidents, eliminate recurring issues, and increase automation maturity.
Operational Automation & Toil Reduction:
Direct automation strategy for run operations, leveraging Python, GitOps, AI-assisted tooling, or self service workflows to reduce manual effort.
OpenShift Platform Readiness:
Lead OpenShift cluster lifecycle activities, including new cluster builds, configuration, onboarding, upgrades, and cluster decommissioning, ensuring consistency, reliability, and compliance across environments.
Cross Team Operational Enablement:
Influence engineering, security, and development teams to adopt consistent operational patterns, guardrails, and readiness practices.
Compliance & Controls:
Ensure adherence to regulatory, security, and audit requirements; maintain strong operational hygiene and documentation.
Mentorship & Team Development:
Guide other team members, develop skill maturity, and strengthen operational best practices across the organization.
Required Qualifications:
- 8+ year s experience in Systems Operations, Cloud Operations, SRE, or related operational roles
- 6+ years in Kubernetes/OpenShift platform operations
- 4+ years Linux systems operations experience
- Experience leading operational teams or platform operations functions
- Strong record of operational automation and incident leadership