Site Reliability Engineer
Job
Tata Consultancy Services Limited
Deerfield, IL (In Person)
Full-Time
Review key factors to help you decide if the role fits your goals.
Pay Growth
?
out of 5
Not enough data
Not enough info to score pay or growth
Job Security
?
out of 5
Not enough data
Calculating job security score...
Total Score
99
out of 100
Average of individual scores
Skill Insights
Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.
Job Description
Must Have Technical/Functional Skills 7+ years of experience in SRE, platform engineering, or cloud infrastructure engineering in large-scale enterprise environments (10,000+ employees or equivalent complexity). Deep, hands-on expertise with Microsoft Azure — minimum 4 years in a primary Azure cloud engineering role. Expert-level proficiency with
AKS:
cluster lifecycle management, RBAC, network policies, pod security standards, cluster autoscaler, and Workload Identity.Strong infrastructure-as-code skills:
Terraform (required) and/or Bicep; experience managing Azure Landing Zones or Enterprise-Scale architecture. Proficiency in at least one systems programming/scripting language: Python (preferred), Go, or PowerShell. Experience designing and operating enterprise observability platforms using Azure Monitor, Log Analytics and Application Insights at scale. Demonstrable track record of owning SLOs/SLIs and delivering measurable reliability improvements in production. Strong knowledge of enterprise networking inAzure:
Hub-and-Spoke/Virtual WAN, ExpressRoute, Azure Firewall, NSGs, Private Endpoints, and DNS Private Zones.Required/Preferred Certifications:
AZ-104 | AZ-305 (Preferred) | AZ-400 (Preferred) | CKA | ITIL v4 Foundation Roles & Responsibilities Reliability & Availability Engineering Define, own, and enforce enterprise-wide SLOs, SLIs, and Error Budgets across all Tier-0 and Tier-1 Azure-hosted services; report SLA compliance to executive stakeholders monthly. Lead architectural reviews for new services and ensure reliability non-functionals (availability targets, RTO/RPO) are embedded from design through to production. Champion and implement chaos engineering practices using Azure Chaos Studio and custom fault injection frameworks to proactively surface reliability risks. Drive Disaster Recovery (DR) design and conduct quarterly DR drills across Azure paired regions. Incident Management & On-Call Serve as Incident Commander for P1/P2 major incidents, own end-to-end incident lifecycle from detection through resolution and Post-Incident Review (PIR). Participate in a structured On-Call rotation with follow-the-sun global coverage; maintain response SLAs of <5 minutes for Tier-0 services. Drive blameless post-mortem culture and ensure all action items from PIRs are tracked and delivered within agreed SLA. Observability & Platform Engineering Design and operate the enterprise observability stack: Azure Monitor, Log Analytics Workspaces, App lication Insights, and Azure Managed Grafana; ensure full MELT (Metrics, Events, Logs, Traces) coverage. Build and maintain alerting frameworks using Azure Monitor Alert Rules and Azure Action Groups integrated with PagerDuty and ServiceNow. Develop and operate platform automation, runbooks, and self-healing capabilities using Azure Automation, Logic Apps, and Python/PowerShell scripting. CI/CD & Infrastructure Reliability Collaborate with DevOps and development teams to embed reliability gates into Azure DevOps pipelines ; automated performance testing, synthetic monitoring, and progressive deployment (canary/blue-green) strategies. Manage reliability of AKS clusters across multiple Azure regions, own node pool scaling, upgrade strategy and cluster hardening in alignment with CIS Benchmarks. Contribute to infrastructure-as-code reliability reviews using Terraform/Bicep to enforce standards across Azure Landing Zones. Generic Managerial Skills, If any Produce monthly reliability dashboards and executive-level reporting aligned to enterprise OKRs and IT Risk frameworks. Collaborate with the Enterprise Architect and Cloud Governance teams to maintain Azure Policy assignments and ensure operational compliance withISO 27001, SOC
2, and internal control frameworks. Mentor junior SREs and engineers across the organization; lead SRE community of practice sessions.TCS Employee Benefits Summary:
Discretionary Annual Incentive.Comprehensive Medical Coverage:
Medical & Health, Dental & Vision, Disability Planning & Insurance, Pet Insurance Plans.Family Support:
Similar remote jobs
Carrington
Jacksonville, FL
Posted2 days ago
Updated1 day ago
International Foundation of Employee Benefit Plans
Brookfield, WI
Posted2 days ago
Updated1 day ago
Similar jobs in Deerfield, IL
1001 Baxter Healthcare Corporation
Deerfield, IL
Posted3 days ago
Updated1 day ago
Similar jobs in Illinois
U025 Kraft Foods Group Brands LLC Company
Chicago, IL
Posted2 days ago
Updated1 day ago
Papa Johns
Champaign, IL
Posted2 days ago
Updated1 day ago