Lead Site Reliability Engineer-North Carolina, Concord Location- Contract Position Available In Cabarrus, North Carolina
Tallo's Job Summary: This job listing in Cabarrus - NC has been recently added. Tallo will add a summary here for this job shortly.
Job Description
Lead Site Reliability Engineer-North Carolina, Concord Location- Contract
Title :
Lead Site Reliability Engineer
Job Type :
Contract
Location :
North Carolina, Concord Location.
We are seeking a Lead Site Reliability Engineer (SRE) with deep expertise in AWS networking
, infrastructure automation
, and production system reliability
. This role demands a strong grasp of observability, operational excellence, and the ability to drive the adoption of DevOps/SRE best practices across engineering teams. You will be instrumental in shaping SLIs/SLOs
, defining our DevOps maturity roadmap
, and building robust, scalable infrastructure using Terraform
, Lambda
, Step Functions
, and more.
You ll be leading a team of SREs and collaborating closely with DevOps, Security, and Application teams to ensure reliable delivery and availability of services.
Key Responsibilities:
Lead and mentor
a team of SREs in developing scalable infrastructure and operational processes.
Design and implement SLIs, SLOs, and Error Budgets
across critical services and evangelize them across product teams.
Architect and manage AWS networking
environments including VPCs, Transit Gateways, Route 53, VPNs, NACLs
, and Security Groups
.
Manage and monitor Palo Alto
and Fortigate firewalls
, and integrate them with cloud environments for hybrid network visibility.
Define and evolve DevOps maturity models
, guiding teams toward higher automation and reliability.
Build and manage observability dashboards using Grafana, Cloudwatch
and Datadog
to track application and infrastructure health.
Implement and maintain Infrastructure as Code (IaC)
using Terraform
to automate cloud deployments across environments.
Develop and maintain serverless applications
using AWS Lambda
and Step Functions to support platform automation and operations.
Collaborate with developers to define GitLab CI/CD pipelines
and streamline the build, test, and deployment lifecycle.
Champion incident response
, blameless postmortems
, and continuous improvement initiatives.
Write scripts in Python
or Bash
to automate tasks and integrate systems.
Required Qualifications:
7+ years in SRE, DevOps, or Systems Engineering roles with increasing responsibility.
Proven experience managing AWS production environments
with a focus on networking.
In-depth knowledge of Palo Alto
and/or Fortigate firewall management and troubleshooting
.
Expertise in monitoring and observability tools
, including Grafana
and Datadog
.
Hands-on experience with Terraform
in managing cloud infrastructure at scale.
Experience building and deploying serverless architectures
using Lambda
and Step Functions
.
Demonstrated understanding of SLI/SLO design
, error budgets
, and reliability metrics
.
Strong understanding of CI/CD principles
and tools like GitLab CI/CD
.
Proficiency in scripting using Python
or Bash
.
Preferred Qualifications:
AWS Certifications (e.g., Solutions Architect
, Advanced Networking
, DevOps Engineer
)
Familiarity with DevOps/SRE maturity models
and implementing organizational transformation.
Experience with compliance frameworks (SOC2, ISO 27001, etc.) as they pertain to infrastructure reliability.
Familiarity with container orchestration is a plus.
Soft Skills:
Strong leadership and mentoring capabilities.
Ability to translate complex technical problems into actionable initiatives.
Excellent communication and cross-functional collaboration skills.
Bias for automation and continuous improvement
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job
Dice Id:
10110049
Position Id:
8675516