Cloud Engineer - Observability & SRE
Job
GDH
Remote
$137,280 Salary, Full-Time
Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
85
out of 100
Average of individual scores
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Role Summary A senior Cloud Engineer with expertise in building and managing scalable observability and infrastructure platforms for enterprise-level cloud microservices environments. This hybrid role demands hands-on experience with container orchestration, cloud infrastructure automation, and high-volume monitoring systems. The engineer will own end-to-end components, support production operations, and leverage AI tools for system troubleshooting and code generation. Responsibilities Design, develop, and operate observability platforms enabling logging, metrics collection, and tracing for cloud-based microservices applications. Manage and optimize large-scale Kubernetes clusters across multiple regions, including Helm chart management, pod scheduling, and resource tuning. Own and maintain CI/CD pipelines using tools such as Argo CD, Helm, and GitOps methodologies to ensure reliable deployment workflows. Implement Infrastructure as Code (IaC) solutions utilizing Terraform on AWS to provision and manage cloud infrastructure at scale. Operate and maintain monitoring ecosystems including OpenSearch/Elasticsearch, Prometheus, Grafana, Splunk, and Kafka, ensuring high availability and performance. Develop automation solutions to detect, respond, and remediate production issues proactively. Ensure security and compliance by managing vulnerability patching and automating security best practices in container environments. Collaborate with cross-functional teams to improve system reliability, scalability, and performance, contributing to distributed system design. Participate in on-call rotations, incident response, and post-incident analysis to uphold SLA commitments. Utilize AI-assisted coding and troubleshooting tools to accelerate system development, automation, and incident resolution. Qualifications Bachelor''s degree in Computer Science, Information Technology, or related field. Minimum of 8 years of experience in DevOps, SRE, or platform engineering roles supporting production cloud environments. Proven incident response experience, including alert triage, root cause analysis, and SLA management in 24/7 operations. Expertise in Infrastructure as Code principles with proficiency in Terraform, Ansible, or similar automation tools for cloud provisioning. Strong scripting skills in Python, Golang, or Bash for automation, tooling, and CI/CD pipeline integration. Extensive experience operating and troubleshooting large-scale Kubernetes workloads, including Helm chart management and multi-cluster orchestration. Hands-on knowledge of observability stacks such as OpenSearch, Prometheus, Grafana, Loki, and Splunk, including query optimization and capacity planning. Familiarity with Kafka and AWS MSK, including cluster operation, topic configuration, and schema management. Experience deploying, managing, and migrating Splunk Enterprise environments with Kubernetes-based log shipping architectures. Working knowledge of OpenTelemetry, distributed tracing, and application performance monitoring in cloud environments. Understanding of security frameworks, container hardening practices, and vulnerability remediation at scale, including standards such as Fed
RAMP, STIG, IL5, ISO
27001, and SOC 2. Experience using AI tools like LLMs, GitHub Copilot, or custom AI agents to enhance operational workflows and incident management. Effective communication skills and the ability to work independently in a hybrid work setting.Publishing Pay Range:
$65.00 - $67.00 hourly This position offers a hybrid schedule, with time split between the office and remote work.Similar remote jobs
Southern Company
Durham, NC
Posted2 days ago
Updated19 hours ago
Commonwealth of PA
Pennsylvania
Posted2 days ago
Updated19 hours ago
Memorial Sloan Kettering Cancer Center
New York, NY
Posted2 days ago
Updated19 hours ago
University of Minnesota
Saint Paul, MN
Posted2 days ago
Updated19 hours ago
Similar jobs in Plano, TX
Jovie of North Texas
Plano, TX
Posted2 days ago
Updated19 hours ago