Production Engineer Jobs in Plano,TX,US
W3global
Plano, TX (In Person)
Full-Time
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Our client is seeking a Production Engineer / Site Reliability Engineer (SRE) to support the design, implementation, and maintenance of highly available, scalable, and resilient cloud-based systems. This individual will play a key role in maintaining production stability, improving operational efficiency, and ensuring system reliability across enterprise applications and microservices environments. The ideal candidate will have experience supporting cloud-native architectures, CI/CD automation, monitoring and observability tools, and production incident management within fast-paced enterprise environments. Key Responsibilities Production Engineering & Site Reliability Design, implement, and maintain highly available, scalable, and resilient production systems. Own end-to-end operational responsibilities including monitoring, incident response, root cause analysis, automation, and capacity planning. Troubleshoot and resolve production issues to minimize downtime and improve overall system reliability. Collaborate with development, QA, infrastructure, and DevOps teams to streamline deployment processes and operational workflows. Support and improve service-level objectives (SLOs) and service-level agreements (SLAs). Enforce best practices related to security, compliance, disaster recovery, and operational excellence. Develop and maintain operational documentation, deployment procedures, and system runbooks. Cloud & Microservices Engineering Responsibilities Build, deploy, and maintain cloud-native microservices using Java, Spring Boot, and JavaScript technologies. Design and implement RESTful APIs and event-driven architectures using AWS services such as Lambda, ECS/EKS, SQS, and SNS. Develop and maintain CI/CD pipelines using Jenkins, GitLab CI, AWS CodePipeline, or similar tools to support automated testing and deployments. Monitor application and infrastructure performance using AWS CloudWatch, Prometheus, Grafana, and distributed tracing tools such as Jaeger or AWS X-Ray. Perform root cause analysis and implement solutions to improve production stability and reliability. Implement and maintain security controls including IAM roles, OAuth2, JWT authentication, and encryption standards for data in transit and at rest. Collaborate cross-functionally to design fault-tolerant, resilient systems with automated failover and recovery capabilities. Optimize cloud resource utilization and cost efficiency through rightsizing and autoscaling strategies. Automate operational tasks and incident response processes using scripting and Infrastructure as Code tools such as Terraform and CloudFormation. Required Qualifications 3-4+ years of experience in Production Engineering, Site Reliability Engineering (SRE), DevOps, or related production support roles. Hands-on experience with Java, JavaScript, Spring Boot, and cloud-native microservices architectures. Strong experience working with AWS cloud services including Lambda, ECS/EKS, SQS, and SNS. Experience developing and maintaining CI/CD pipelines using Jenkins, GitLab CI, AWS CodePipeline, or similar tools. Familiarity with monitoring and observability tools including CloudWatch, Prometheus, Grafana, Jaeger, and AWS X-Ray. Experience troubleshooting production issues and conducting root cause analysis in enterprise environments. Knowledge of Infrastructure as Code tools such as Terraform and CloudFormation. Understanding of security best practices, disaster recovery processes, and system reliability principles. Strong communication, collaboration, and documentation skills. Preferred Qualifications Experience supporting large-scale, cloud-based enterprise applications. Exposure to highly available and fault-tolerant distributed systems. Experience improving operational efficiency through automation and observability initiatives. Familiarity with Agile development and DevOps methodologies.