Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Description What You Will Own:
Linux Infrastructure Operations + Full lifecycle administration of ~222 Linux servers (production, QA, development) + OS upgrades, patch management, and kernel updates + Performance monitoring and system tuning (CPU, memory, disk I/O, network) + User access management and authentication integrations (LDAP/AD) + Backup validation and disaster recovery readiness Kubernetes / OpenShift Platform Ownership + Deploy, administer, and support Kubernetes/OpenShift clusters across environments + Manage cluster lifecycle: installation, upgrades, patching, and scaling + Configure and maintain: + Namespaces, RBAC, and security policies + Networking (CNI, ingress controllers, load balancing) + Persistent storage (PVCs, storage classes) + Support application teams with container deployments, troubleshooting, and performance tuning + Monitor cluster health using tools like Prometheus, Grafana, and native OpenShift tooling + Optimize cluster resource utilization and capacity planning + Implement and maintain CI/CD integrations for containerized workloads Security & Hardening + Implement and maintain patching cadence across Linux and Kubernetes environments + System hardening aligned to
CIS/STIG
best practices + SELinux configuration and enforcement + Firewall configuration (iptables / firewalld) + Kubernetes security best practices (RBAC, pod security standards, image scanning) + Support vulnerability remediation from tools (Tenable, Qualys, etc.) + Log monitoring and audit review across infrastructure and containers Incident Response & Production Stability + Lead root cause analysis (RCA) for infrastructure and platform incidents + Participate in on-call support for critical systems and clusters + Resolve Sev1/Sev2 outages across Linux and Kubernetes environments + Develop post-incident documentation and preventative controls Modernization & Automation + Assess and remediate deprecated platform components + Standardize system and cluster configurations + Build documentation and operational runbooks + Drive infrastructure-as-code and automation initiatives (Ansible, Terraform, etc.) + Support migration of legacy workloads to containerized platforms