Job Description
View More Jobs Systems Engineer - Cloud Ops Memphis, TN, United States Hot Job Trending Apply Now Job Description As a Systems Engineer on the Cloud Operations team, you will be responsible for deploying, managing, and optimizing our cloud-based infrastructure on Google Cloud Platform (GCP). You will work with technologies such as Terraform, Kubernetes (GKE), GitOps/Argo
CD, CI/CD
pipelines, and observability tools to ensure reliable, secure, and scalable platform operations. You will also contribute to our AI/ML platform initiatives, supporting infrastructure for LLM-based applications and AI-powered automation tools that enhance developer productivity and operational efficiency. You will collaborate with development teams, SREs, and platform architects to ensure seamless deployment and delivery of applications while maintaining the highest standards of reliability, security, and performance. Responsibilities Cloud Infrastructure, Automation & Operations:
Design, build, and maintain cloud infrastructure using Terraform to automate provisioning, scaling, and lifecycle management of resources on GCP Develop and maintain CI/CD pipelines using GitLab CI to automate build, test, and deployment workflows. Implement and maintain GitOps practices using ArgoCD for declarative, version-controlled application deployment Monitor system performance using observability tools (Dynatrace, Cloud Monitoring, Prometheus/Grafana) and troubleshoot production issues Participate in on-call rotation to provide 24/7 support for critical infrastructure incidents Perform root cause analysis on incidents and implement preventive measures. Document runbooks, architecture decisions, and operational procedures Kubernetes Platform Management:
Deploy, configure, and manage containerized applications on Google Kubernetes Engine (GKE), including GKE Autopilot and Standard clusters Manage cluster lifecycle including upgrades, node pool configurations, and capacity planning Troubleshoot pod failures, CrashLoopBackOff, OOMKilled events, and container resource issues Configure and optimize resource requests/limits, Horizontal Pod Autoscaler (HPA), and Vertical Pod Autoscaler (VPA) Manage Kubernetes networking including Services, Ingress controllers, Network Policies, and DNS configurations. Implement and manage service mesh (Istio) for traffic management, observability, and security Manage secrets and configurations using Kubernetes Secrets, ConfigMaps, and external secret management tools. Implement pod security standards, RBAC policies, and workload identity configurations AI/ML Platform & Automation:
Support infrastructure for AI/ML workloads including LLM-based applications and model serving platforms Deploy and manage AI-powered developer tools such as coding assistants (Claude Code, GitHub Copilot) and agentic AI systems. Explore and implement AI-assisted incident response and automated remediation workflows Build and maintain infrastructure for Retrieval-Augmented Generation (RAG) pipelines and vector databases Configure GPU-enabled node pools and optimize resource allocation for AI/ML workloads Implement MCP (Model Context Protocol) servers and AI agent integrations for operational automation Stay current with emerging AI technologies and evaluate their applicability for infrastructure automation Qualifications Kubernetes Expertise (Essential): 3+ years hands-on experience with Kubernetes in production environments Deep understanding of Kubernetes architecture: API server, etcd, scheduler, controller manager, kubelet Experience with GKE (Standard and Autopilot modes), including cluster creation, upgrades, and maintenance Proficiency in troubleshooting workloads: analyzing pod logs, events, describe outputs, and container states Strong understanding of resource management: requests, limits, QoS classes, and resource quotas Experience with Kubernetes networking: Services (ClusterIP, NodePort, LoadBalancer), Ingress, Network Policies Knowledge of Kubernetes storage: PersistentVolumes, PersistentVolumeClaims, StorageClasses, dynamic provisioning Experience with Helm charts for application packaging and deployment Familiarity with Kubernetes security: RBAC, Pod Security Standards, Secrets management, Workload Identity Understanding of Kubernetes observability: metrics-server, kubectl top, container resource monitoring Experience debugging common issues: ImagePullBackOff, CrashLoopBackOff, OOMKilled, Evicted pods, pending pods Cloud & Infrastructure:
3+ years of experience with Google Cloud Platform (GCP) services including GKE, Cloud Run, Cloud SQL, Memorystore, Pub/Sub, and Cloud Logging Strong experience with Terraform for infrastructure as code (IaC) Understanding of cloud networking: VPCs, subnets, firewall rules, Cloud NAT, Private Service Connect CI/CD & GitOps:
Proficiency with GitLab CI/CD pipelines Experience with ArgoCD or similar GitOps tools Understanding of Helm charts and Kustomize for Kubernetes manifest management Observability & Troubleshooting:
Experience with monitoring and APM tools (Dynatrace, Datadog, Prometheus, Grafana) Ability to analyze logs, metrics, and traces to diagnose production issues Familiarity with JVM troubleshooting (heap dumps, thread analysis, GC tuning, connection pool issues) AI/ML Knowledge:
Basic understanding of LLM concepts, prompt engineering, and AI model deployment Familiarity with AI coding assistants and their integration into development workflows Interest in agentic AI systems and autonomous automation tools Exposure to vector databases (Pinecone, Weaviate, pgvector) and RAG architectures is a plus Systems & Networking:
Strong Linux administration skills Understanding of networking concepts (DNS, load balancing, firewalls, TCP/IP) Experience with service mesh (Istio) is a plus General:
Excellent problem-solving and analytical skills Strong written and verbal communication Ability to work effectively in a collaborative, cross-functional environment Experience working in an Agile/DevOps culture Bachelor's degree in Computer Science, Information Technology, or related field (or equivalent experience) Since opening our first store in 1979, AutoZone has grown into a leading retailer and distributor of automotive parts and accessories across the Americas. Our customer-first mindset and commitment to Going the Extra Mile define who we are, for both our customers and AutoZoners. Working at AutoZone means being part of a team that values dedication, teamwork, and growth. Whether you're helping customers or building your career, we provide tools and support to help you succeed and drive your future. Benefits at AutoZone AutoZone offers thoughtful benefits programs with one-on-one benefits guidance designed to improve AutoZoners' physical, mental and financial well-being. All AutoZoners (Full-Time and Part-Time): Competitive pay Unrivaled company culture Medical, dental and vision plans Exclusive discounts and perks, including an AutoZone in-store discount 401(k) with company match and Stock Purchase Plan AutoZoners Living Well Program for free mental health support Opportunities for career growth Additional Benefits for Full-Time AutoZoners:
Paid time off Life, and short- and long-term disability insurance options Health Savings and Flexible Spending Accounts with wellness rewards Tuition reimbursement Minimum age requirements may apply. Eligibility and waiting period requirements may apply; benefits for AutoZoners in Puerto Rico, Hawaii, or the U.S. Virgin Islands may differ. Learn more about all that AutoZone has to offer at Careers.
AutoZone.com. We proudly support Veterans, Active-duty Service Members, Reservists, National Guard and Military Families. Your experience is highly valued, and we encourage you to apply to join our team. Online Application:
An online application is required. Click the Apply button to complete your application. For step-by-step instructions on how to apply visit careers.autozone.com/candidateresources. AutoZone, and its subsidiary, ALLDATA are equal opportunity employers. All applicants will be considered for employment without attention to age, race, color, religion, sex, sexual orientation, gender identity, national origin, veteran or disability status, or any other legally protected categories. Job Info Job Identification 105932 Posting Date 06/01/2026, 11:46 AM Job Schedule Full time Locations 123 S Front St, Memphis, TN, 38103, US Apply Now Similar Jobs