Site Reliability Engineer – Remote / Telecommute Position Available In Wake, North Carolina
Tallo's Job Summary: The Site Reliability Engineer position offers remote work flexibility and requires coordinating system refreshes, managing Azure policies and security, overseeing migrations, conducting resiliency testing and audits, and providing mentoring and training. The role demands deep Azure knowledge, troubleshooting skills, scripting capabilities, incident management experience, and familiarity with Epic implementations. Dice is recruiting for this position.
Job Description
Site Reliability Engineer – Remote / Telecommute
Job Description:
Coordinate system refreshes, restore tests, and DR failovers, especially for Epic or other mission critical applications.
Own P1/P2 escalations when L2 cannot resolve and lead major incident war rooms (root cause analysis, post incident reviews).
Azure Policies, Security and Network:
Enforce Azure Policies and RBAC; manage vulnerability scans (Microsoft Defender, or other tools), patching any discovered weaknesses.
Update and maintain firewall rules, NSGs, or other network security baselines.
Migrations and Decommissioning:
Oversee more complex migrations between environments or Azure regions (sometimes involving re platforming or re architecting).
Perform advanced data snapshot validations and coordinate system retirement/decommission tasks.
Resiliency Testing and Audits:
Plan and execute chaos engineering exercises, including rollback or failback scenarios.
Conduct regular security audits, ensuring that any deviations from compliance are documented and remediated.
Mentoring and Training Responsibilities:
Deliver quarterly training sessions to L2 on new tools, processes, or changes to the environment.
Act as final escalation point for complex or unknown technical issues.
Skills Set:
Deep knowledge of Azure services, including network configuration, VM management, storage, AD, DNS, and security controls.
Ability to architect and troubleshoot large, complex environments-both manually and with automated tools.
Strong scripting or automation capabilities (PowerShell, Azure CLI) for large scale patching or configuration updates.
Experience in incident management, root cause analysis, and producing post mortem reviews.
Familiarity with Epic implementations (on-prem / cloud).
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job
Dice Id:
10516350
Position Id:
2025-218742