Senior Site Reliability Engineer

Job

PowerPlan, Inc

Remote

Full-Time

Posted 2 days ago (Updated 12 hours ago) • Actively hiring

Expires 6/13/2026

Apply for this opportunity

This job application is on an outside website. Be sure to review the job posting there to verify it's the same.

See Job Scorecard

Review key factors to help you decide if the role fits your goals.

How is this calculated?

Pay Growth

out of 5

Not enough data

Not enough info to score pay or growth

Job Security

out of 5

Not enough data

Calculating job security score...

Total Score

out of 100

Average of individual scores

Were these scores useful?

Skill Insights

Compare your current skills to what this opportunity needs—we'll show you what you already have and what could strengthen your application.

Job Description

Overview This is a principal-level individual contributor role at the heart of our cloud platform's reliability, scalability, and operational maturity. You will work hands-on across AWS and Azure environments, solving complex production problems while systematically eliminating the manual toil that creates them. The role offers significant autonomy, deep technical impact, and the opportunity to shape how reliability engineering is practiced across the organization. COMPANY PowerPlan operates a growing SaaS platform supporting enterprise customers with mission-critical workloads. We run complex, multi-cloud environments and value engineers who take ownership, think in systems, and build solutions that scale. Our culture emphasizes operational excellence, blameless learning, and collaboration across Engineering, Support, Professional Services, and Product teams. Responsibilities

KEY PERFORMANCE OBJECTIVES

(First 12 Months)

OBJECTIVE 1

Platform Familiarity Through Escalations & Early Automation (First 90 Days)

Outcome:

Within 90 days, resolve escalated infrastructure cases across major AWS and Azure services and deliver 2-3 targeted automations that measurably reduce manual resolution time for recurring issues.

Impact:

Accelerates ramp-up, demonstrates immediate value, and establishes the expectation that operational issues are systematically automated rather than repeatedly handled manually.

How:

Work directly on escalated cases from Support and Professional Services, document manual resolution steps, identify repeatable patterns, and implement focused Python or PowerShell automations tied to high-frequency workflows.

OBJECTIVE 2

Eliminate Top Sources of Operational Toil (3-6 Months)

Outcome:

Within 3-6 months, eliminate or significantly reduce manual intervention for the top 5-7 highest-frequency operational issues through automation, self-service tooling, or infrastructure improvements.

Impact:

Reduces support load, improves service stability, and frees Cloud Engineering capacity for higher-value reliability and platform initiatives.

How:

Analyze case and incident data, prioritize automation candidates by frequency and impact, build production-grade automations and runbooks, and partner with Support and PS teams to validate adoption and effectiveness.

OBJECTIVE 3

Mature Incident Response & Post‑Incident Learning (6-9 Months)

Outcome:

By month 9, establish a consistent, high-quality incident response and post-incident review process resulting in faster containment, clearer ownership, and tracked corrective actions for all critical production incidents.

Impact:

Reduces repeat incidents, improves on-call effectiveness, and increases organizational confidence during high-severity events.

How:

Lead critical incidents, standardize incident runbooks, facilitate blameless postmortems, track follow-up actions to completion, and coach teams on effective incident communication and decision-making.

OBJECTIVE 4

Deliver a Mature, SLO‑Aligned Observability Platform (9-12 Months)

Outcome:

By month 12, deliver a mature observability layer across AWS and Azure with service-level dashboards, tuned alerts, and clear SLI/SLO reporting actively used by on-call and engineering teams.

Impact:

Improves detection, diagnosis, and prevention of production issues while reducing alert fatigue and enabling data-driven reliability decisions.

How:

Design Grafana dashboards aligned to service health and user journeys, integrate metrics, logs, and traces from core platforms, tune alert thresholds, and embed observability into CI/CD and incident workflows. Qualifications

WHAT YOU BRING

Deep hands-on experience operating production systems in AWS and Azure environments Strong automation skills using Python and PowerShell in operational contexts Proven ability to identify repetitive operational work and eliminate it through automation Experience leading incident response and blameless post-incident reviews Strong observability expertise, particularly with Grafana and SLI/SLO-driven monitoring Ability to influence engineering practices without formal authority Clear written and verbal communication skills across technical and non-technical audiences PowerPlan is an EOE Applicant and Candidate Privacy Notice Please note that this is a hybrid role that involves a combination of onsite work from our corporate office as well as work from home. While we strive to accommodate flexible working arrangements when sensible, there will be times when onsite work is required. This could include scheduled office days, team meetings, client meetings, or special events.

Similar remote jobs

Job
Journeyman Electrician-2nd / 3rd shift
VE
Veolia Environnement SA
Minnetonka, MN
Posted2 days ago
Updated12 hours ago
Job
Brand Videographer & Content Strategist
DM
Dog Magnet Training
Utah
Posted2 days ago
Updated12 hours ago
Job
MuleSoft Architect in the USA or Canada
CF
Cloud for Good
Asheville, NC
Posted2 days ago
Updated12 hours ago
Job
Senior Supplier Fulfillment Leader
GA
GE Aerospace
Chicago, IL
Posted2 days ago
Updated12 hours ago
Job
Research Scientist (ETS) | Temporary
EU
Emory University
Atlanta, GA
Posted2 days ago
Updated12 hours ago

Similar jobs in Atlanta, GA

Job
Retail Training Specialist
A
ACO-US
Atlanta, GA
Posted2 days ago
Updated12 hours ago
Job
Client Manager - Agriculture
UB
UMB Bank
Atlanta, GA
Posted2 days ago
Updated12 hours ago
Job
Outpatient Transplant/ Clinical Dietitian I, Intern Program
EH
Emory Healthcare/Emory University
Atlanta, GA
Posted2 days ago
Updated12 hours ago
Job
RN - NICU
HA
Health Advocates Network
Atlanta, GA
Posted2 days ago
Updated12 hours ago
Job
Informatics Analyst II - OIT
EU
Emory University
Atlanta, GA
Posted2 days ago
Updated12 hours ago

Similar jobs in Georgia

Job
Classroom Assistant
SH
Soliant Health
LaFayette, GA
Posted2 days ago
Updated12 hours ago
Job
Registered Nurse-FT Day Shift 6am-2:30pm $41-$46 Per Hour (77356)
CH
Centurion Health
Valdosta, GA
Posted2 days ago
Updated12 hours ago
Job
Retail Training Specialist
A
ACO-US
Atlanta, GA
Posted2 days ago
Updated12 hours ago
Job
Occupational Therapist School Setting in Smyrna, GA
SH
Soliant Health
Smyrna, GA
Posted2 days ago
Updated12 hours ago
Job
Medical Assistant
EH
Emory Healthcare/Emory University
Alpharetta, GA
Posted2 days ago
Updated12 hours ago